AN  AUTONOMOUS  SYSTEM  FOR  QUANTIFYING 
THE  PERCEPTUAL  USE  OF  ACOUSTIC  SPEECH  CUES: 
VOICING  IN  INTERVOCALIC  /t~d/  IN  FRENCH 


By 


THOMAS  ROBERT  SA WALLIS 


A DISSERTATION  PRESENTED  TO  THE  GRADUATE  SCHOOL 
OF  THE  UNIVERSITY  OF  FLORIDA  IN  PARTIAL  FULFILLMENT 
OF  THE  REQUIREMENTS  FOR  THE  DEGREE  OF 
DOCTOR  OF  PHILOSOPHY 


UNIVERSITY  OF  FLORIDA 
1996 


UNIVERSITY  OF  FLORIDA  LIBRARIES 


Copyright  1996 
by 

Thomas  R.  Sawallis 


For  my  wife  and  my  mother, 
and  in  memory  of  my  father. 


key  to  the  treasure  is  the  treasure." 
Scheherazade  and  the  Genie, 


in  John  Barth's  Chimera 


ACKNOWLEDGMENTS 


I must  be  doing  well  in  my  life,  with  so  many  people  to  be  grateful  for. 

Thanks  go  first  to  the  committee:  Bill  Sullivan  and  Caroline  Wiltshire, 
the  chair  and  cochair,  for  seeing  it  all  through  at  the  end;  to  Jean  Casagrande 
and  Sam  Brown  for  their  long  and  continuing  support  and  encouragement; 
and  to  Ira  Fischler  for  introducing  me  to  the  psychological  materials  which 
have  become  so  important  to  my  work. 

Without  my  mentors  at  the  Institut  de  la  Communication  Parlee 
(formerly  the  Institut  de  Phonetique)  in  Grenoble,  I would  never  have 
become  a phonetician,  and  the  main  thanks  (or  maybe  blame)  go  to  Christian 
Abry  with  his  blasted  amusing  "reblochon"  project.  But  it  was  Louis-Jean  Boe 
who  quite  accidentally  changed  the  course  of  my  life  by  an  offhand  diagram  of 
phonemes  analyzed  into  features  analyzed  into  cues."  And  then  of  course 
Christian  taught  me  to  think  while  Louis-Jean  taught  me  to  work.  Even 
more,  they  welcomed  me  back  to  do  this  dissertation  project.  Grand  merci. 

Alice  Faber  always  had  more  faith  in  me  than  I deserved,  and  has  done 
more  to  help  comfort  and  inspire  me  to  get  through  this  than  I could  ever 
repay.  She  has  been  an  emotional  and  practical  rock  of  stability  in  the  shifting 
grounds  of  my  academic  life  for  a decade,  and  I am  grateful  to  her  forever. 

David  Birdsong  was  also,  too  briefly,  a guide  in  techniques  for  melding 
linguistics  and  experimentation,  until  he  moved  on  to  broader  pastures. 

I would  like  to  thank  (in  approximate  chronological  order)  a number  of 
researchers  with  whom  I have  had  discussions  which  were  probably 


v 


insignificant  to  them,  but  to  me  were  anything  but:  Leigh  Lisker,  Willy 
Serniclaes,  Dom  Massaro,  Jean-Luc  Schwartz,  Terry  Nearey,  Randy  Diehl, 
Dave  Green,  Bruno  Repp,  John  Ohala,  Jim  Flege,  and  Bob  Sorkin. 

Then  there  are  the  friends  and  colleagues  in  and  around  grad  studies 
with  whom  I shared  excitement,  frustration,  snide  comments  and  beer  (or 
rather  coffee  with  the  Muslims).  I hope  the  thousands  of  you  I have 
accidentally  missed  will  forgive  me  and  those  I hereby  remember  to  thank: 
David  Johns,  Christian  Benoit,  Bruce  Hamilton,  John  Bro,  Krista  Thoren, 
Veronique  Auberge,  Rabia  Belrhali,  Nour-Edine  Achab,  Sonia  Kandel,  Ted 
Lacombe,  Gerard  Bailly,  Rudolph  Sock,  Pierre  Badin,  Linda  Northrup,  David 
Bogdan,  Karen  Eberly,  Jan  Lap,  Ivy  Silverman,  Feng  Gang,  Vincent  Chang. 

For  refuge,  respite,  and  relief  from  my  studies,  I have  relied  on  my 
extralinguistic  friends,  and  I thank  them  profusely:  Glenn  Rampe,  Dick 
Burggraf,  Doug  Lanier,  Anthony  Adams,  Robin  Lauriault,  Ariel  Bloede, 
Roselyne  Roesch,  Therese  Lacoste,  Jeans  Petrissans,  and  my  Pasak  in-laws. 

Finally,  there  are  four  women  who  have  lightened  my  life  in  more 
ways  than  I can  begin  to  express.  God  knows  why  Barbara  Bloede  needed  a 
surrogate  son  with  six  kids  grown  already,  but  I would  never  have  survived 
France  without  her.  My  aunt  Catherine  Marion's  support,  good  humor,  and 
discretion  have  always  been  a delight.  My  mom,  Nancy  Sawallis,  loves  me  of 
course,  but  it  is  her  combination  of  frankness  and  tolerance  of  my  rare,  but 
flagrant  flaws  that  make  me  like  her  as  well  as  love  her.  My  deepest  thanks 
go  to  my  wife,  Pamela  Pasak  Sawallis,  who  still  believes  in  me  despite 
everything.  Her  kindness  and  selflessness,  her  smile  and  her  laugh,  and  her 
unshakable  faith  in  me  are  all  my  joy.  I am  grateful  for  her  every  day. 

And  my  constant  gratitude  goes  to  my  late  father,  Robert  Funston 
Sawallis,  who  taught  me  to  love  knowledge  and  respect  wisdom. 


vi 


TABLE  OF  CONTENTS 


ACKNOWLEDGMENTS 

LIST  OF  TABLES 

LIST  OF  FIGURES 

ABSTRACT 

CHAPTERS 

1 INTRODUCTION 

1.1  Statement  of  Problem  2 

1.2  Statement  of  Purpose  4 

2 LITERATURE  REVIEW g 

2.1  Cues  Exist g 

2.2  Cues  Work  in  Natural  Speech  9 

2.3  There  are  Many  Cues  to  the  Same  Percept 11 

2.4  Cues  Differ  Across  Languages  13 

2.5  Cues  Interact ig 

Rankings  lg 

Trading  Relations  18 

Perceptual  Decision  Models 19 

2.6  General  Critique  of  the  Literature 26 

3 APPLYING  SIGNAL  DETECTION  THEORY  TO  CUES 29 

3.1  The  Basics  of  Signal  Detection  29 

3.2  Applying  the  Mechanics  of  SDT  32 

3.3  Structure  of  Cues  in  Speech 35 

3.4  Standardizing  Signal  Strength  Across  Languages  37 

3.5  Testing  Cue  Strengths  Across  Languages  40 

3.6  The  Specific  Need  for  Signal  Detection  Theory  42 

3.7  Recapitulation  4g 


vii 


4 PROCEDURAL  OVERVIEW 47 

4.1  Experiment  Design  47 

4.2  Corpus  Design 

Opposition  

Cues 

Language 54 

4.3  Corpus 

Tokens  55 

Talker 

Recording 55 

Token  Verification  58 

Digitization 59 

5 EXPERIMENT  1:  PHONETIC  SURVEY 60 

5.1  Measurement  Philosophy  and  its  Implementation 60 

5.2  Labeling  Events  63 

VVT  (Vocalic  Voice  Termination) 63 

VT  (Voice  Termination) 64 

CFO  (Consonant  Frication  Onset) 64 

VO  (Voice  Onset) 65 

WO  (Vocalic  Voice  Onset) 65 

5.3  Durations 66 

5.4  Intensity  

RMS  Amplitude  Curves  71 

Burst  and  Voicebar  Measurements 72 

Relative  Amplitude  Reference  Point  73 

Final  Amplitude  Measurement  Technique  75 

5.5  Label  Verification 75 

5.6  Final  Cue  Definitions  (Recapitulation) 76 

5.7  Results 77 

Comparison  with  other  studies  77 

Application  to  Experiment  2 82 

6 EXPERIMENT  2:  PERCEPTION  TEST  PROCEDURES 84 

6.1  Analogous  Weakening  Pattern  Across  Cues  84 

6.2  Factorial  Design  within  Cues  86 

6.3  Calculating  Sensitivities  with  the  Stimulus  Array  89 

6.4  Stimulus  Construction  93 

Shortening  [t] 93 

Attenuating  [t]  Burst  Amplitude 93 

Attenuating  [d]  Voicing  Amplitude 94 

Lengthening  [d] 94 

viii 


6.5  Stimulus  Presentation 9g 

Randomization  and  Order  Effects 98 

Stimulus  Presentation  and  Subject  Accommodation 

Difficulties 99 

6.6  Materiel 102 

6.7  Subjects'  Task  and  Preparation  102 

6.8  Subject  Groups 104 

7 EXPERIMENT  2:  PROCESSING  AND  TABULATION  OF 

PERCEPTUAL  DATA 107 

7.1  A Terminological  Note  107 

7.2  Coding  the  Response  Sheets 109 

7.3  Validating  the  Initial  Responses  HI 

7.4  Raw  Responses  and  Error  Rates  112 

7.5  The  Problem  of  Statistical  Analysis  114 

General  Strategy  for  Statistics 115 

Confidence  Intervals  Hg 

7.6  Results 12q 

8 ANALYSIS  AND  COMMENTARY 127 

8.1  General  Aspects 

8.2  The  Duration  of /d/  129 

8.3  A Detailed  Analysis  and  A Design  Weakness 130 

8.4  The  Vowel  Analysis  I34 

8.5  The  Better  Analyses 135 

8.6  Conclusions 

Effects 

Trends .”.138 

Methodological  Issues 141 

8.7  Recommendations  for  Redesign  141 

The  Base  Rate 

Statistical  Models 143 

Further  Possible  Improvements 144 

9 CONCLUSIONS  AND  PROSPECTS 146 

9.1  Innovations 

9.2  Directions 

9.3  Significance  

9.4  Implications  153 


IX 


APPENDICES 


A SUBJECT  MATERIALS 155 

B RESPONSE  DATA 157 

C DATA  TABLES 174 

Sequencing  across  the  Tables 174 

Layout  of  Information  Within  the  Tables 175 

Response  Proportions 176 

d'  Values 

D ORIGINAL  CONTRAST  RESULTS 199 

LIST  OF  REFERENCES 204 

BIOGRAPHICAL  SKETCH 215 


x 


LIST  OF  TABLES 


Table 


page 


2.1  The  Lisker  List  of  Cues  to  Stop  Consonant  Voicing 12 

3.1  Response  Classifications  in  Signal  Detection  Theory 32 

5.1  Phonetic  Events  and  their  Labels 61 

5.2  Means  and  Standard  Deviations  of  Measured  Cues 78 

6.1  Classification  of  Perceptual  Stimuli 89 

6.2  Pattern  of  Period  Duplications  for  Lengthening  of  [d] 96 

6.2  Numbers  of  Subjects  per  Group 105 

7.1  Numerical  Data  on  Overall  Error  Rates 113 


8.1  Numerical  Results  for  Mods  NF  DDur. 


B.l  Responses  to  / ti/  Stimuli 167 

B.2  Responses  to  /ta/  Stimuli 168 

B.3  Responses  to  /tu/  Stimuli 169 

B.4  Responses  to  /di/  Stimuli 170 

B.5  Responses  to  /da/  Stimuli 171 

B.6  Responses  to  /du/  Stimuli 172 


B.7  Stimulus  Assignments  for  Cue  Factorial  Arrays 


173 


LIST  OF  FIGURES 


Figure  page 

2.1  Cross-Language  VOT  Range  Differences 14 

2.2  Goodness  Ratings  of  Stimuli  as  /pi/  as  a Function  of  VOT 21 

3.1  Schematic  Representation  of  Sensitivity  as  the  Separation  of 

Two  Distributions 31 

3.2  The  Subject's  and  Experimenter's  Views  of  the  Signal  Detection 

Experiment 34 

3.3  Bimodal  and  Monomodal  Cue  Contrasts 36 

3.4  Equivalent  Cue  Settings  in  Two  Hypothetical  Languages 39 

5.1  Location  of  Events  in  Sample  Tokens  of  /dadat/  and  /datat/ 61 

5.2  Measured  Cue  Distributions 79 

7.1  A Graphic  View  of  Group  Error  Rates 113 

7.2  Contrast  Reduction  Results  by  Group 123 

7.3  Contrast  Reduction  Results  by  Group  Pool 124 

7.4  Contrast  Reduction  Results  by  Cue 125 

7.5  Contrast  Reduction  Results  by  Vowel 126 

D.l  Original  Contrast  Results  by  Group 200 

D.2  Original  Contrast  Results  by  Group  Pool 201 

D.3  Original  Contrast  Results  by  Cue 202 

D.4  Original  Contrast  Results  by  Vowel 203 


LIST  OF  FIGURES 


Figure  page 

2.1  Cross-language  VOT  range  differences 14 

2.2  Goodness  Ratings  of  Stimuli  as  /pi/  as  a function  of  VOT 21 

3.1  Schematic  representation  of  sensitivity  as  the  separation  of  two 

distributions 31 

3.2  The  subject's  and  experimenter's  views  of  the  signal  detection 

experiment 34 

3.3  Bimodal  and  monomodal  cue  contrasts 36 

3.4  Equivalent  cue  settings  in  two  hypothetical  languages 39 

5.1  Location  of  events  in  sample  tokens  of  /dadat/  and  /datat/ 61 

5.2  Measured  cue  distributions 79 

7.1  A Graphic  View  of  Group  Error  Rates 113 

7.2  Contrast  Reduction  Results  by  Group 123 

7.3  Contrast  Reduction  Results  by  Group  Pool 124 

7.4  Contrast  Reduction  Results  by  Cue 125 

7.5  Contrast  Reduction  Results  by  Vowel 126 

D.l  Original  Contrast  Results  by  Group 200 

D.2  Original  Contrast  Results  by  Group  Pool 201 

D.3  Original  Contrast  Results  by  Cue 202 

D.4  Original  Contrast  Results  by  Vowel 203 


Abstract  of  Dissertation  Presented  to  the  Graduate  School 
of  the  University  of  Florida  in  Partial  Fulfillment  of  the 
Requirements  for  the  Degree  of  Doctor  of  Philosophy 


AN  AUTONOMOUS  SYSTEM  FOR  QUANTIFYING 
THE  PERCEPTUAL  USE  OF  ACOUSTIC  SPEECH  CUES: 
VOICING  IN  INTERVOCALIC  /t~d/  IN  FRENCH 


By 


Thomas  Robert  Sawallis 


August,  1996 


Chairman:  William  J.  Sullivan 

Cochair:  Caroline  Wiltshire 

Major  Department:  Program  in  Linguistics 

Research  has  shown  that  phonemic  decisions  in  speech  perception  are 
influenced  by  the  information  in  multiple  speech  cues,  and  that  those  cues 
are  produced  with  different  settings  across  various  different  contexts, 
including  different  languages.  However,  there  has  never  been  an  accepted 
method  for  quantifying  the  perceptual  use  or  importance  (often  termed 
weight  ) of  cues  in  a way  that  would  allow  cross  language  comparison, 
perhaps  the  most  basic  of  linguistic  study  types.  The  existing  assessment 
methods  are  either  limited  or  designed  in  ways  that  make  them  incapable  of 
rendering  such  a comparison. 

This  research  proposes  a quantitative,  independent,  abstract,  and 
autonomous  measurement  system  for  a cue's  perceptual  importance.  The 

xiii 


system  begins  with  a survey  of  the  cue's  acoustic  measurements  in  natural 
speech,  since  a memory  of  that  distribution  is  the  presumed  basis  for  the 
listener  s expectations  of  that  cue.  Then  perceptual  stimuli  are  designed 
according  to  a pattern  based  on  the  standard  deviation  of  the  cue's  surveyed 
distribution,  which  has  the  effect  of  normalizing  the  step  size  in  the  stimulus 
series  across  cues  in  all  possible  environments,  including  language.  Finally, 
the  stimuli  are  played  to  listeners  and  the  responses  are  compared  using 
mathematical  techniques  from  Signal  Detection  Theory,  specifically,  the  d' 
sensitivity  measure.  This  has  the  advantage  of  quantifying  the  cue's 
importance  independently  of  the  overall  performance  accuracy,  which  can 
vary  for  reasons  extraneous  to  the  perceptual  system,  and  the  quantification  is 
done  on  an  abstract  scale  suitable  for  any  cue  in  any  context  in  any  language. 

The  measurement  system  is  tested  on  four  cues  for  the  voicing  contrast 
of  intervocalic  /t~d/  in  French,  with  native  and  normative  listeners,  and  the 
results,  though  limited,  are  promising.  For  instance,  native  French  speakers 
are  shown  to  use  hold  duration  (as  operationalized)  more  in  perceiving  / 1/ 
than  in  perceiving  /d/.  Also,  certain  weaknesses  are  discovered  in  the 
experiment  design  used  to  implement  the  measurement  system,  and 
remedies  are  proposed  to  allow  more  and  stronger  conclusions  as  the 
measurement  system  is  refined  and  developed. 


xiv 


CHAPTER  1 
INTRODUCTION 


The  pioneering  work  at  Haskins  Laboratories  in  the  1950's  (see  Borden, 
Harris,  & Raphael,  1994,  and  Liberman,  1993,  for  review)  showed  that 
phonetic  contrasts  are  signaled  by  multiple  acoustic  properties,  termed  cues, 
from  which  the  listener  extracts  information  early  in  the  process  of  speech 
perception.1  Since  the  "discovery"  of  cues,  one  important  goal  of  acoustic 
phonetics  research  has  been  the  understanding  of  how  cues  are  used  in 
speech  perception.  Part  of  that  goal  is  concerned  with  determining  which 
cues  are  more  important  in  a particular  contrast  and  which  are  less. 

The  impetus  behind  this  dissertation  was  the  simple  question  of 
whether  we  know  how  to  compare  the  importance  of  any  given  acoustic  cue 
across  two  (or  more)  languages.  A valid  cross-language  comparison  would 
require  a generalizable  autonomous  metric  to  quantify  the  perceptual 
importance  of  a given  cue  experimentally.  Several  distinct  bodies  of 
literature  (discussed  in  Chap.  2)  address  this  topic  at  least  tangentially  but  do 
not  supply  such  a metric,  so  it  appears  none  has  yet  been  developed. 

This  dissertation  describes  a new  method  of  determining  the 
importance  of  a cue  using  Signal  Detection  Theory.  The  method  can  be 
applied  to  any  cue  for  any  phonemic  contrast  in  any  environment  in  any 
language  and  furnishes  estimates  of  a cue's  importance  (or  weight)  across  the 


1 Ultimately,  the  information  is  used  to  identify  the  phonological  units, 
whether  features,  phonemes,  or  syllables,  which  are  used  for  lexical  access. 


1 


2 


most  important  part  of  its  range.  This  method  is  more  generalizable  than  any 
other  currently  in  use,  and  the  autonomous  cue  weights  it  generates  could  be 
useful  in  fields  beyond  speech  perception,  especially  phonology  and  speech 
technology. 


1.1  Statement  of  Problem 

Despite  all  we  do  know  about  cues,  we  do  not  currently  have  a clear 
way  of  quantifying  the  actual  use  of  a given  cue  in  perception.  This  is  an 
important  lacuna  in  our  understanding  of  cue  function  in  speech  perception. 
The  techniques  discussed  below  (section  2.4)  do  not  suffice.  Their  findings  are 
limited  to  the  specific  situation  investigated  and  they  do  not  provide  a metric 
for  general  quantification  which  would  allow  comparison  across  cues, 
features,  phonological  environments,  or  languages. 

One  problem  can  be  illustrated  by  the  following  gedanken  experiments. 
Consider  two  languages,  T and  L,  both  of  which  have  two  series  of  stop 
consonants.  Many  of  the  cues  of  the  Lisker  List  (Lisker,  1986,  discussed  below 
in  section  2.3)  would  be  expected  to  contrast  in  the  two  series.  If  we  knew  that 
T was  a tone  language  and  L had  contrasting  vowel  length,  we  might  suspect 
some  sort  of  cognitive  optimization  in  stop  perception  whereby  T would 
favor  cues  involving  fundamental  frequency  and  L cues  with  temporal 
impact.  We  would  not  be  surprised,  after  appropriate  measurements,  to  find 
that  T had  less  variability  in  FO  contour  than  L,  nor  that  L had  less  VOT 
variability  than  T.  In  careful  speech,  we  might  expect  to  find  that  T 
exaggerated  FO  and  L exaggerated  VOT.  We  might  accept  such  measurements 
as  sufficient  evidence  that  T relied  more  on  FO  contour  than  some 
"unmarked"  ideal  language,  and  likewise  L on  VOT.  Or  we  might  construct 
synthetic  series  for  perception  experiments,  varying  FO  contour  by  25  hz  steps 


3 


and  VOT  in  5 ms  increments.  Seeing  a "closer"  boundary  for  T's  FO  contour 
than  L s,  and  for  L s VOT  than  T s,  we  would  consider  the  case  proven. 

But  consider  two  other  languages,  TT  and  LL,  with  the  same 
phonological  patterns  as  T and  L,  but  unlike  T and  L in  that  they  have 
identical  statistical  distributions  of  the  measured  acoustic  cues  and  equal 
boundaries  in  the  perception  test.  Can  we  consider  that  TT  and  LL  make 
identical  use  of  these  cues?  No,  because  it  is  still  possible  that  the  perceptual 
system  uses  the  cues  differently  in  the  two  languages,  for  instance  through 
different  weighting  factors.  In  fact,  T and  L could  also  have  identical 
weighting  factors,  even  though  their  cue  measurement  distributions  and 
identification  boundaries  are  different. 

This  pair  of  gedanken  experiments  illustrates  an  important  logical 
possibility  in  the  structure  of  phonetic  perception  which  has  been 
insufficiently  explored  in  the  literature.  The  phonetic  setting  specified  by  a 
language  for  a particular  cue  may  not  by  itself  determine  the  level  of 
perceptual  use  of  that  phonetic  information.  The  literature  includes 
numerous  and  classic  examples  of  acoustic  phonetic  studies  where  the  results 
are  taken  to  reveal  language-specific  cue  settings,  and  many  where  these 
settings  are  compared  across  languages.  In  contrast,  there  are  few  studies 
attempting  to  quantify  perceptual  use  of  cues,  fewer  still  attempting  to  do 
cross-language  comparisons  of  cue  use,  and  apparently  none  attempting  both 
at  once.  What  would  be  required  to  combine  and  accomplish  these  two  goals? 

In  order  to  appropriately  compare  the  effective  perceptual  use  of  cues 
across  languages,  a measure  is  needed  which  is: 

Quantitative:  not  just  a rank,  because  ranks  can  hide  vast  differences; 

Independent:  not  a ratio  of  one  cue  to  another,  because  some  languages 
might  have  different  inventories  of  cues  (or  features,  or 


4 


phonemes,  or  contrasts),  and  because  it  avoids  "factorial 
explosion”  of  experimental  conditions; 

Abstract,  not  tied  to  any  acoustic  scale,  because  the  diverse  set  of  cues 
already  known  are  measured  on  a variety  of  different  physical 
scales  and  hence  cannot  be  directly  compared;  and 

Autonomous:  not  linked  to  a specific  cue,  because  measurement  of  a 
perceptual  effect  of  putative  cues  yet  to  be  discovered  is  essential  to 
description  of  perceptual  structures  within  and  across  languages. 

A metric  of  this  sort  would  constitute  direct  evidence  of  a language's  use  of  a 
cue,  and  would  allow  comparison  of  a cue  s importance  across  languages. 

Of  course  such  a metric  would  be  appropriate  for  much  broader 
application  than  just  the  cross-language  comparisons  envisaged  here.  It 
should  also  be  used  for  comparison  of  cues  within  a single  language  but 
across  phonemes,  within  a single  phoneme  but  across  environments,  and  so 
on.  In  fact,  it  would  be  a generalized  metric  capable  of  measuring  any  cue,  in 
any  situation,  for  any  purpose. 

1.2  Statement  of  Purpose 

The  purpose  of  this  research  is  twofold:  first,  to  establish  a 
measurement  technique  capable  of  quantifying  the  perceptual  use  of  cue 
information  in  a form  allowing  valid  and  practical  cross-language 
comparisons,  and  second,  to  demonstrate  the  use  of  this  measurement  in  a 
limited  data  set. 

The  measurement  is  the  d"  sensitivity  measurement  from  signal 
detection  theory,  adapted  to  speech  perception  by  appropriately  defining 
signal  and  noise,  and  rendered  autonomous  with  regard  to  both  cue  and 
language  by  defining  the  cue's  signal  strength  relative  to  the  cue's  observed 
distribution  in  the  language. 


5 


The  measurement  is  then  demonstrated  in  an  experiment  where  4 
cues  in  a corpus  of  French  intervocalic  [t  - d]  are  surveyed,  then  manipulated, 
and  finally  presented  to  both  native  and  non-native  speakers  of  French  in  a 
2AFC  perceptual  test.  Thus,  the  new  measure  is  used  to  test  whether:  1)  All 
cues  are  equally  important  overall,  with  each  token  judged  solely  by  its  place 
in  the  language  s expected  distribution;  and  2)  A cue’s  importance  is  identical 
for  all  human  languages  (irrespective  of  inter-cue  equality  under  number  1), 
and  is  thus  unaffected  by  native  language. 

The  results  for  French  show  that  the  new  measure  can  reveal 
differences  in  perceptual  importance  across  cues  within  French,  as  well  as 
differences  in  the  perceptual  importance  assigned  to  cues  by  native  and 
nonnative  speakers. 


CHAPTER  2 
LITERATURE  REVIEW 


Modern  research  into  cues  began  in  the  1940s,  and  the  themes  treated 
in  that  research  changed  as  certain  questions  were  resolved  and  new 
questions  presented  themselves.  The  quantification  scheme  addressed  in  this 
study  is  only  indirectly  related  to  some  of  these  themes,1  but  it  is  significantly 
shaped  by  these  five:  1)  Cues  exist.  2)  Cues  work  in  natural  speech.  3)  There 
are  many  cues  to  the  same  percept.  4)  Cues  differ  across  languages.  5)  Cues 
interact.  Some  of  the  important  literature  on  these  themes  is  outlined  below. 

2.1  Cues  Exist 

The  dual  assets  of  the  sound  spectrograph,  then  recently  introduced 
into  speech  research,  and  the  Pattern  Playback,  which  they  had  just  invented, 
gave  Haskins  Labs  in  the  1950s  an  unprecedented  ability  to  both  investigate 
the  fine  acoustic  structure  of  natural  speech  and  test  the  perception  of  that 
structure  with  synthetic  speech.  Both  of  these  technologies  allowed  the 
processing  of  a great  deal  more  data  with  a great  deal  more  finesse  than  had 
previously  been  possible  with  oscilloscopes,  kymographs,  and  recording 
techniques. 

The  earliest  work  with  these  complementary  technologies  was 
concerned  with  verifying  and  exploring  the  role  of  formants  in  vowels,  and 

lAn  example:  Is  the  cognitive  representation  of  speech  primarily  articulatory 
(Fowler,  1986;  Liberman  & Mattingly,  1985),  auditory/acoustic  (Diehl  & 
Kluender,  1989),  or  linguistic  (Remez,  1994)  ? 


6 


7 


with  the  analogies  between  acoustic  and  articulatory  vowel  spaces.  However, 
researchers  soon  began  to  examine  the  rich  detail  of  the  various  classes  of 
consonants,  and  especially  the  stop  consonants.  The  first  such  study 
published,  Liberman  et  al.  (1952),  described  the  effect  of  release  burst  frequency 
on  perceived  place  of  articulation  of  unvoiced  stop  consonants.  Stops  with 
high  bursts  were  perceived  as  apical,  those  with  bursts  just  above  the  F2  (and 
to  some  extent,  the  FI)  of  the  following  vowel  as  velar,  and  the  rest  as  bilabial. 
The  next  study.  Cooper  et  al.  (1952),  added  the  effect  of  second  formant  (F2) 
transitions  on  perceived  place  and  that  of  FI  transitions  on  perceived  voicing. 
A transition  rising  into  the  F2  was  perceived  as  a bilabial,  while  flat  or  falling 
F2  onset  transitions  were  perceived  as  apical  or  velar,  depending  on  the 
frontness  or  backness  of  the  following  vowel.  Rising  FI  onsets  were 
perceived  as  voiced,  and  flat  onsets  as  voiceless.  Delattre  et  al.  (1955) 
systematized  the  work  on  formant  transition  with  the  idea  of  the  "locus,"  an 
origin  point  to  which  the  transitions  seemed  to  point.  The  F2  locus  for  /b / 
was  720  Hz,  for  /d/  1800  Hz,  and  for  / g/  3000  Hz. 2 The  FI  locus  was  identical 
for  all  three,  at  or  below  240  Hz  (the  lower  limit  of  their  technical  capacity). 

Research  slowly  extended  beyond  initial  stops,  consistently  showing 
that  cues  contribute  to  phoneme  perception.  Liberman  et  al.  (1956)  showed 
that  duration  of  transition  determined  whether  a stimulus  was  perceived  as  a 
stop  or  glide  before  a monophthong,  or  simply  as  a diphthong.  Lisker  (1957a) 
showed  closure  duration  affected  the  perception  of  voicing  in  medial  stops. 
O'Connor  et  al.  (1957)  revised  locus  theory  and  first  found  the  inescapable 
need  for  F3  through  work  on  the  glides  and  liquids  [w,  j,  r,  1]  in  initial 
position,  while  Lisker  (1957b)  corroborated  their  findings,  using  a different 


2This  value  for  /g/  applied  only  to  vowels  with  F2  above  about  1200  Hz, 
essentially  the  front  vowels. 


8 


methodology  on  the  same  segments  in  intervocalic  position.  Harris  (1958) 
showed  that  some  fricatives  are  adequately  signaled  by  their  noise  spectrum, 
but  others  require  formant  transitions  as  well.  Harris  et  al.  (1958) 
systematically  studied  the  effect  of  F3  transitions  on  place  perception  for 
voiced  stops.  Liberman  et  al.  (1958)  showed  that  noise  excitation  (as  opposed 
to  periodicity)  and  delayed  FI  onset  (termed  "cutback")  favored  perception  of 
voicelessness  in  initial  stops.  Hoffman  (1958)  performed  a massive  study 
simultaneously  manipulating  burst  frequency  and  FI  and  F2  transitions  in 
initial  stops  and  showing  their  interaction  in  perception.  Liberman  et  al. 
(1959)  took  stock  of  how  the  phonetic  details  discovered  up  to  that  time  could 
be  incorporated  in  speech  synthesis  by  rule. 

As  important  as  the  phonetic  details  reported  in  this  literature  were  the 
development  of  the  concept  of  cues  and  the  application  of  that  term  to  the 
phonetic  phenomena  discussed  above.  Liberman  et  al.  (1952)  may  have 
written  the  first  article  on  speech  to  use  the  term  "cue,"3  but  it  also  uses  other 
terms  as  synonyms:  "essential  stimulus-correlates,"  "irreducible  acoustic 
correlate,"  "sound  patterns,"  "stimulus-essentials,"  "acoustic  variable,"  and 
"irreducible  acoustic  stimulus."  In  Cooper  et  al.  (1952),  "cue"  is  used 
confidently  in  parts  of  the  text  but  not  at  all  in  other  parts,  as  though  the  parts 
were  written  either  at  different  times  or  by  different  people.  Still,  by  1954  it 
seems  the  Haskins  researchers  were  discussing  both  the  term  and  the  idea  of 
"cue"  with  psychologists  (Liberman  et  al.,  1954;  Liberman  et  al.,  1956),  speech 
researchers  (Delattre  et  al.,  1955)  and  linguists  (Lisker,  1957a;  O'Connor  et  al., 
1957;  Lisker,  1957b). 

3 It  may  be  interesting  to  note  that  the  following  article  in  the  same  journal 
uses  cues  in  the  title  and  perceptual  cues"  in  the  text  and  addresses  visual 
illusions  rather  than  speech.  Perhaps  "cue"  was  commonly  used  by 
psychologists,  and  was  appropriated  for  use  in  speech. 


9 


Though  Haskins-written  texts  seem  not  to  have  supplied  an  exact 
definition,  the  common  conceptual  background  of  phonetics  by  the  late 
1950  s^  seems  to  have  included  a view  of  cues  as  the  acoustic  phenomena  of 
speech  which  affect  phonemic  identification  of  segments.* * 5  That  is  certainly 
the  point  of  view  in  Liberman  (1957)  and  that  which  Delattre  expressed  to  the 
French-speaking  research  community  in  his  overview  article,  "Les  indices 
acoustiques  de  la  parole:  Premier  rapport"  (The  acoustic  cues  of  speech:  First 
report,  1958). 


2.2  Cues  Work  in  Natural  Speech 

Most  of  the  research  discussed  in  the  preceding  section  was  carried  out 
on  the  Haskins  pattern  playback,  with  all  the  associated  simplification  and 
idealization  inevitable  in  the  execution  of  hand-painted  spectrograms.  It  was 
natural  to  hope  for  a confirmation  of  their  findings,  and  of  the  general 
usefulness  of  cues,  in  natural  speech  as  well.  Three  early  studies  adapted  the 
tape-splicing  technique  (seemingly  first  reported  in  Harris,  1953)  to  confirm 
the  operation  of  cues  in  natural  speech. 

Schatz  (1954)  set  out  to  replicate  Liberman  et  al.  (1952)  by  cross-splicing 
consonant  bursts  with  different  vowels,  but  ran  afoul  of  the  dual  effects  of 
both  consonant  and  vowel  on  the  intervening  aspiration-excited  transitions. 
A follow-up  experiment  cross-spliced  unaspirated  [sk]  clusters  with  [id,  ar,  ul], 
as  well  as  [k]  bursts  with  [hip,  hap,  hup]  (with  place-neutral  [h]  shortened  to 

^ Witness  in  this  regard  the  numerous  non-Haskins  articles  reprinted  in 

Lehiste  (1967). 

5 Sometimes  researchers  seemed  to  wish  to  reserve  the  term  for  only  the 
phenomenon  which  most  affects  phonemic  identification,  as  in  the  title  of 
Liberman  et  al.  (1959):  "Minimal  Rules  for  Synthesizing  Speech".  This 
preference  eventually  died  out,  and  the  term  "cue"  came  to  be  used  for  all 
such  phenomena. 


10 


an  appropriate  substitute  for  normal  stop  aspiration).  Her  results 
corroborated  the  findings  of  Liberman  et  al.  regarding  the  effect  of  burst  noise 
frequency  on  perceived  place  of  articulation. 

Malecot  (1956)  extended  and  partly  replicated  Liberman  et  al.  (1954) 
with  regard  to  nasal  consonant  place  of  articulation.  He  cross-spliced  a variety 
of  stimuli,  steady-state  nasal  consonants,  steady  state  vowels,  nasal  resonances 
and  vowels  from  syllables  with  initial  nasal  consonants,  and  vowels  from 
syllables  with  initial  voiced  stops.  He  concludes  that  the  transitions  "serve  as 
place  cues,"  while  the  nasal  consonant  closure  resonances  "are  not  completely 
neutral  with  respect  to  place  identification;  they  contain  a small  amount  of 
place  information."6 

Lisker  (1957a)  used  tape  splicing  to  investigate  the  cue  value  of  closure 
duration  in  post-stressed  intervocalic  stops.  After  surveying  the  durations  in 
a small  corpus,  he  spliced  a range  of  durations  of  blank  tape  into  the  closure 
of  a token  of  rupee  and  played  them  for  identification  to  a group  of 
listeners.  The  resulting  identification  curve  showed  the  amount  of 
shortening  of  [p]  required  to  overcome  the  other,  presumably  spectral,  [p]  cues 
and  achieve  a percept  of  [b].  The  analogous  operation  on  "ruby"  was 
somewhat  less  natural,  replacing  closure  voicing  with  silence,  but  still 
showed  extra  lengthening  required  for  a [p]  percept.  In  short,  closure  duration 
clearly  functioned  as  a cue.  Moreover,  with  both  kinds  of  cross-spliced 
stimuli,  "rub-pee"  and  "rup-by",  he  found  intermediate  values,  which 
showed  that  other  [p-b]  cues  must  exist  on  either  side  of  the  closure. 


6 Cf.  note  5 above,  Malecot  apparently  declines  to  use  the  term  "cue"  in  regard 
to  the  place  information  in  the  nasal  closure,  since  he  has  shown  the 
transition  cue  is  stronger. 


11 


These  three  studies,  and  others  which  followed,  were  convincing 
evidence  that  those  cues  which  had  been  demonstrated  in  synthetic  speech 
functioned  in  the  perception  of  natural  speech  as  well. 

2.3  There  are  Many  Cues  to  the  Same  Perrppt 

Many  of  the  studies  cited  in  the  two  previous  sections  document, 
either  individually  or  collectively,  the  existence  of  multiple  cues  for  the  same 
phonemic  contrast: 

stop  consonant  place  by  burst  (Liberman  et  al.,  1952)  and  transition 
(Cooper  et  al.,  1952), 

nasal  place  by  murmur  and  transition  (Malecot,  1956), 

fricatives  by  spectrum  and  transition  (Harris,  1958), 

liquids  ([w,  j,  r,  1])  by  formants  and  transitions  (O'Connor  et  al.,  1957, 
Lisker,  1957b)  and  by  transition  duration  (Liberman  et  al.,  1956). 

The  same  two  sections  note  research  showing  multiple  cues  for  the  voicing 
contrast  in  stop  consonants: 

FI  transitions  (Cooper  et  al.,  1952), 

closure  duration  (Lisker,  1957a), 

transition  excitation  and  FI  onset  delay  (Liberman  et  al,  1958). 

As  time  passed  and  more  research  was  done,  stop  consonant  voicing 
became  the  archetypal  example  of  a phonological  contrast  signaled  by 
multiple  cues,  partly  for  reasons  discussed  below  (section  2.4),  but  primarily 
because  of  the  richly  detailed  acoustic  structure  evident  in  stop  consonants. 
This  can  be  seen  in  the  arguments  in  Parker  (1977),  Lisker  (1978),  and  Edwards 
(1981).  The  most  concise  statement  of  the  multiple  nature  of  cues  may  be 


12 


Lisker  (1986)  Voicing  in  English:  A Catalogue  of  Acoustic  Features 
Signaling  /b/  Versus  /p/  in  Trochees,"  which  lists  the  following  cues: 

Table  2.1 

The  Lisker  List  of  cues  to  stop  consonant  voicing  (adapted  from  Lisker  1986). 


Cue 

Pre- 

Closure 

Position 

Closure 

Post- 

Closure 

duration  of  vowel 

X 

decay  time  of  signal 

X 

duration  of  closure 

X 

duration  of  glottal  signal 

X 

intensity  of  glottal  signal 

X 

release  burst  intensity 

X 

FI  transition  duration 

X 

X 

FI  termination/onset  frequency 

X 

X 

termination /onset  of  FI  transition 

X 

X 

(i.e.,  "FI  cut  forward" 

and  "FI  cutback") 

timing  of  voice  termination/onset 

X 

X 

(i.e.,  VTT  and  VOT) 

FO  contour 

X 

X 

Lisker  explicitly  states  that  this  formulation  of  the  list  is  not  intended 
as  fixed,  definitive,  or  unalterable.  There  is,  however,  a definitive 
understanding  among  phoneticians  that  multiple  cues  typically  affect  the 
perception  of  a single  phonological  feature. 


13 


2.4  Cues  Differ  Across  Languages 

Early  work  on  cues  was  reported  as  though  there  were  no  appreciable 
differences  between  phonetically  analogous  segments  in  different  languages. 
An  investigation  into  the  F2  transition  for  velars,  for  instance,  would  be 
presented  as  though  it  were  valid  for  all  languages  with  velar  consonants. 
This  disingenuous  position  became  untenable  during  the  mid  to  late  1960's, 
as  cross-language  studies  were  published.7 

The  first  classic  study  was  again  by  Lisker  and  Abramson  (1964).  They 
examined  VOT  in  initial  stop  consonants  in  eleven  languages,  including 
languages  with  2,  3,  and  4 stop  series.  The  ability  of  different  languages  to 
marshal  the  same  phonetic  tools  (essentially,  voicing  and  aspiration)  and 
arrive  at  different  numbers  of  opposing  stop  series  had  been  a long-standing 
phonological  problem.  Determining  an  underlying  phonetic  explanation  for 
this  phenomenon  would  be  an  important  step  forward.  Lisker  and 
Abramson  found  that  for  all  the  languages  studied,  a stop  series  with  VOTs  in 
the  range  of  about  0-25  ms  was  opposed  to  either  a series  lower  than  about  - 
75  ms  or  one  higher  than  about  50  ms,  or  both.8  (From  low  to  high,  these 
categories  have  come  to  be  called  "lead,"  "short  lag,"  and  "long  lag,"  referring 
to  the  timing  of  voicing  onset  relative  to  release  of  occlusion.  Consonants 
with  these  VOTs  are  now  typically  termed  "voiced,"  "voiceless,"  and 
"voiceless  aspirated.")  Though  the  formulation  of  these  three  ranges9  was 

7The  hypothesis  that  "phonetics  is  universal,  and  the  language-specific  is 
phonology  has  had  a broad  appeal  in  linguistics,  and  has  some  supporters 
even  now.  See  Keating  (1985)  for  arguments  against  this  position. 

8 Korean,  with  3 stop  series,  and  the  4 series  languages  Hindi  and  Marathi  are 
interesting  in  that  they  have  stop  categories  which  coexist  in  the  same  VOT 
range,  and  are  differentiated  by  other  cues. 

9See  Raphael  et  al.  (1995)  for  evidence  of  a possible  fourth  range  in  between 
short  and  long  lag. 


14 


the  primary  conclusion  of  the  study,  it  was  also  clear  that  there  were 
important  cross-language  differences  in  the  regions  actually  used  within  the 
lead  and  long  lag  (i.e.,  voiced  and  voiceless  aspirated)  categories.  As  shown  in 
Fig.  2.1,  Korean  apical  long  lags  were  so  long  that  their  distribution  barely 
intersected  with  the  much  shorter  long  lags  of  Eastern  Armenian  and 
Marathi  long  lags.  Indeed,  the  distributions  of  voicing  lead  for  Marathi  and 
Hungarian  velars  do  not  intersect,  because  the  former  is  so  long  and  the  latter 
so  short.  Thus,  though  there  are  some  clear  regularities  with  the  cue  of  VOT, 
there  are  also  clear  cross-language  variations  within  those  regularities. 

Korean  / tV  __ — L— _ 

Eastern  Armenian  /tV  I 

Marathi  /t*1/  I 

Marathi  /f1/  1 

| 1 1 1 I | I 1 1 1 1 1 1 

0 50  100 

Hungarian  /g/ 

■— ■ 1 Marathi  /g/ 


-150  -100  -50 

Figure  2.1.  Cross-language  VOT  range  differences.  The  horizontal  bar 
represents  the  entire  range  of  the  sample,  and  the  whisker  represents  its 
mean.  Data  from  Lisker  and  Abramson,  1964.  (Marathi  /f*1/  is  retroflex.) 


15 


Lisker  and  Abramson  (1970),  another  classic  study,  played  a synthetic 
VOT  continuum  for  identification  to  speakers  of  Spanish,  English,  and  Thai. 
The  results  showed  not  only  that  the  identification  crossover  boundaries 
differ  for  different  languages,  but  also  that  those  boundaries  do  not  necessarily 
occur  at  exactly  the  category  boundaries  found  in  production  data.  The 
validity  of  the  identification  boundary  differences  is  buttressed  by  the 
discrimination  data  of  Abramson  and  Lisker  (1970),  but  the  most  striking 
cross-language  difference  of  Lisker  and  Abramson  (1970)  is  perhaps  the 
adherence  of  the  identification  curves  to  the  "gap"  in  the  consonant 
inventory  of  Thai.  In  identifying  the  synthetic  stimuli  mimicking  labial  and 
apical  places  of  articulation,  Thai  listeners  produced  two  boundaries,  dividing 
the  continuum  into  the  voiced,  voiceless,  and  voiceless  aspirated  ranges 
appropriate  to  their  phonemic  inventory  (/b,  p,  ph,  d,  t,  t7).  However,  for  the 
velar  stimuli  they  produced  only  one  boundary,  as  is  appropriate  for  the 
language.  The  phonemic  inventory  of  Thai  stops  is  asymmetric,  lacking  a 
/g/,  so  the  remaining  boundary,  /k,  kh/,  is  in  a location  analogous  to  those  of 
/p,  ph/  and  /t,  tV. 

One  last  classic  article  is  Chen  (1970),  which  investigates  preceding 
vowel  length  as  a cue  for  syllable  final  obstruent  voicing.  Chen  calculates  the 
ratio  of  the  (shorter)  vowels  before  voiceless  consonants  to  the  (longer) 
vowels  before  voiced  consonants  for  six  languages  and  finds  that  they  range 
from  0.78  to  0.87,  except  for  English,  which  has  a ratio  of  0.61.  After 
examining  several  possible  explanations,  he  opts  for  a physiological 
explanation10  for  all  the  languages  except  English,  which  instead  "may  be 


10  Specifically,  that  the  higher  intraoral  pressure  of  voiceless  consonants 
requires  stronger  closure,  which  results  in  a faster  closing  gesture. 


16 


accounted  for  by  the  added  perceptual  function  of  vowel  length  in  the 
listeners  differentiation  of  voiced  vs.  voiceless  consonants."  (p.  156) 

Thus,  though  researchers  naturally  dispute  whether  particular  cues  are 
different  in  various  languages,  the  principle  is  established  that  such 
differences  do  occur,  at  least  for  some  cues.  This  principle  leads  quickly  to  the 
question  of  whether  the  difference  rests  exclusively  in  the  differing  cue 
distributions  found  across  languages,  or  whether  there  is  in  addition  a 
separate,  quantifiable  importance  ascribed  to  different  cues,  which  can  also 
be  different  across  languages.  This  is  the  central  question  of  this  dissertation. 

2.5  Cues  Interact 

Researchers  agree  that  since  there  are  multiple  cues  for  a single  percept, 
it  is  essentially  inevitable  that  they  interact  in  the  process  of  perception. 
However,  the  consensus  fails  with  regard  to  the  nature  of  the  interactions 
among  cues.  Though  the  reasons  for  the  divergence  need  not  be  discussed 
here11,  we  should  be  familiar  with  the  three  themes  being  pursued  (cue 
rankings,  trading  relations,  and  quantitative  models  of  integration)  with  what 
they  ultimately  furnish  as  descriptions  of  cue  behavior,  and  also  with  some 
drawbacks  in  each  of  them. 

Rankings 

Abramson  and  Lisker  have  done  two  studies  on  stop  consonant 
voicing  showing  different  ways  to  determine  a ranking  of  the  importance  of 
cues.  In  Abramson  and  Lisker  (1985),  they  cross-varied  ranges  of  VOT  and  FO 

11  Some  possible  reasons:  the  topic  is  young  and  the  research  is  difficult  to 
design,  so  too  few  studies  have  been  done  on  which  to  base  a consensus  ; the 
few  researchers  on  the  topic  have  very  different  backgrounds;  the  researchers 
have  fundamentally  different  theoretical  goals. 


17 


onset  shift  in  the  following  vowel.  Identification  functions  showed  that  VOT 
was  more  important  than  FO  shift,  since  only  ambiguous  VOT  values,  and 
not  the  unambiguous  extreme  values,  allowed  FO  shift  to  affect  perception. 

By  contrast,  the  weaker  FO  shift  cue,  even  at  its  most  extreme  values,  was 
always  overpowered  by  VOT.12 

In  Abramson,  Lisker,  and  Koenig  (1990),  four  contrasting  cues  from  one 
token  each  of  intervocalic  /p/  and  /b/  were  modified,  singly  and  in 
combinations,  to  values  typical  of  the  opposing  phoneme.  Examination  of 
identification  rates  revealed  two  basic  results.  First,  cue  change  effects  were 
cumulative,  the  more  cues  changed,  the  greater  the  effect.  Second,  and  even 
more  reassuringly,  the  ranking  of  effects  was  almost  perfectly  analogous 
between  single  and  multiple  cue  change  conditions.  In  combination,  these 
results  tend  to  support  the  intuitive  expectations  that  some  cues  provide 
more  perceptual  information  than  others,  and  that  more  cues  provide  more 
information  than  fewer.  One  further  result  was  intriguing:  the  cue  rankings 
for  /p/  and  /b/  are  asymmetric.  Closure  shortening  changed  the  most 
percepts  from  / p/  to  /b/,  but  closure  silencing  changed  the  most  from  /b/  to 
/p/.  Thus  it  seems  that  of  the  cues  tested,  closure  duration  is  most  important 
for  /p/  while  closure  buzz  is  most  important  for  /b/. 

The  main  problem  with  a ranking  is  that  it  is  only  a ranking,  not  a true 
metric.  Extreme  differences  in  the  importance  of  cues  could  be  masked  by 
identical  rankings,  while  minor  differences  could  provoke  a change  in  the 
rankings.  Since  rankings  are  only  relative,  there  is  no  way  to  determine  from 
a ranking  how  important  any  single  cue  is  in  the  absolute,  and  of  course, 
without  such  an  anchor,  cross-language  comparisons  of  ranks  are 

12Whalen  et  al.  (1988)  used  reaction  time  data  to  show  that  the  weaker  FO  cue 
is  still  perceived  at  extreme  VOT  settings,  even  though  the  percept  is 
controlled  by  the  VOT. 


18 


meaningless.  Furthermore,  working  out  a complete  ranking  of  the  Lisker  list, 
for  instance,  would  require  a great  deal  of  duplicate  effort,  testing  each  new 
cue  against  successive  candidates  to  identify  its  rank.  Rankings  can 
sometimes  reveal  interesting  patterns,  such  as  the  asymmetry  noted  above, 
but  in  the  end  a real  metric  would  reveal  as  much,  and  more. 

Trading  Relations 


During  the  seventies  and  eighties,  researchers  began  to  investigate  and 
quantify  trading  relationships."13  Often,  a particular  percept  of  a given  token 
can  be  maintained  despite  the  weakening  of  one  of  its  cues  by  the 
proportionate  strengthening  of  a different  cue.  In  such  cases,  the  two  cues  can 
in  a sense  be  traded"  against  one  another.  Determination  of  the  correct 
compensatory  "proportion"  between  the  2 cues  is  a claim  both  about  the 
process  of  speech  perception  and,  ultimately,  about  the  phonetic  structure  of 
the  contrast  studied. 

For  instance,  Repp  (1979)  investigates  the  contribution  of  aspiration 
amplitude  as  a cue  to  voicing  in  initial  stops.  By  varying  the  amplitudes  of 
both  aspiration  noise  and  of  the  following  vowel,  he  showed  that  perception 
of  consonantal  voice  (in  this  experiment,  identification  of  the  stimulus  as  /t/ 
or  /d/)  was  partly  dependent  on  the  ratio  of  aspiration  amplitude  to  vowel 
amplitude,  termed  A/V  ratio:  increasing  the  aspiration  amplitude  or 
decreasing  the  vowel  amplitude  favored  the  voiceless  percept.  Since  the 
stimuli  presented  varied  both  in  A/V  ratio  and  VOT,  Repp  was  able  to 
calculate  that  on  average,  a 1-dB  increase  in  the  A/V  ratio  shortened  the  VOT 
boundary  0.43  ms  toward  the  voiced  stimuli.  This  trading  relation  states. 


13See  Bailey  and  Summerfield  (1980),  Fitch  et  al.  (1980),  and  Repp  (1979,  1984). 
See  especially  Repp  (1982,  1983)  for  thorough  theoretical  treatments. 


19 


therefore,  that  aspiration  amplitude  as  a cue  is  perceived  relative  to  the 
amplitude  of  the  following  vowel,  and  that  1-dB  of  A/V  ratio  provides  the 
same  amount  of  phonetic  information  as  0.43  ms  of  VOT. 

Trading  relations  are  clearly  an  improvement  over  ranks,  since  they 
quantify  the  relationship  between  cues.  It  is  possible  to  imagine  a complete 
description  of  the  voicing  contrast  through  a full  working-out  of  the  Lisker 
list  in  a network  of  quantified  trading  relations.  However,  since  it  is  fairly 
involved  to  plan  and  carry  out  the  research  involved  in  quantifying  trading 
relations  (and  since  the  researchers  are  often  more  interested  in  the 
perceptual  process  than  in  descriptive  phonetics),  much  of  the  research 
involved  in  trading  simply  documents  the  presence  of  trading,  without 
determining  the  ratios  (e.g.,  Repp,  1983).  Moreover,  trading  relations  suffer 
from  two  of  the  flaws  attributed  above  to  rankings:  they  cannot  furnish  any 
autonomous  information  about  a single  cue  in  isolation,  and  they  cannot 
render  any  absolute  information  about  a cue's  importance.  Thus,  though 
they  can  provide  a quantified  contrast  across  languages,  they  ultimately 
cannot  tell  what  that  contrast  may  signify. 

Perceptual  Decision  Models 


Dominic  Massaro  and  Terry  Nearey  (with  their  respective  colleagues) 
have  both  published  mathematical  models  of  speech  perception.  Massaro's 
model  is  called  the  FLMP  (Fuzzy  Logical  Model  of  Perception)  and  Nearey  has 
worked  with  the  NAPP  (Normal  A Posteriori  Probability)  model  and  related 
log-linear  models.  All  are  designed  to  mimic  certain  aspects  of  the  language 
listener's  behavior. 

Though  Massaro's  work  is  best  known  for  its  emphasis  on  the  visual 
contribution  to  normal  speech  perception,  it  is  discussed  here  because  it 


20 


represents  a fully  worked  out  mathematical  model  of  speech  perception. 
Specifically,  the  FLMP  can  accept  naturalistic  speech  input,  it  uses  principled 
methods  for  assigning  numerical  values  to  represent  that  input,  and  it  uses 
those  values  to  ascribe  a category  membership  to  the  input. 

According  to  the  FLMP  framework, 

[Wjell-leamed  patterns  are  recognized  in  accordance  with  a general 
algorithm,  regardless  of  the  modality  or  particular  nature  of  the 
patterns.  . . . Continuously  valued  features  are  evaluated,  integrated, 
and  matched  against  prototype  descriptions  in  memory,  and  an 
identification  decision  is  made  on  the  basis  of  the  relative  goodness  of 
match  of  the  stimulus  information  with  the  relevant  prototype 
descriptions.  (Massaro,  1987,  pp.  16-17) 

There  are  thus  three  aspects  central  to  the  FLMP:  continuous  stimulus 

variables  (rather  than  discrete  or  categorical),  the  ordered  processing  steps  of 

evaluation,  integration,  and  matching,  and  the  role  of  memory  in 

maintaining  the  prototypes.  Though  a full  explication  of  the  model  would  be 

too  detailed  for  presentation  here,  its  framework  seems  quite  well  adapted  to 

the  treatment  of  speech  perception.  There  are,  however,  technical  reasons 

why  the  model  is  poorly  suited,  at  least  in  its  present  form,  to  certain  kinds  of 

speech  research. 

The  first  objection  regards  the  scale  used  to  evaluate  the  stimulus 
variables.  As  its  name  implies,  the  FLMP  adopts  the  fuzzy  logic  scale  ranging 
from  0 to  1,  where  1 represents  absolute  truth,  0 absolute  falsehood,  and  0.5 
ambiguity.  In  the  FLMP,  the  value  1 is  assigned  when  there  is  a perfect  match 
between  a stimulus  feature  and  its  representation  in  the  prototype.  The  value 
0 is  assigned  to  a perfect  match  between  the  opposing  feature  and  its  opposing 
prototype.  Feature  evaluation  thus  uses  a metric  bounded  by  2 terminal 
poles.  But  consider  the  use  of  consonant  hold  duration  as  a cue  to 
intervocalic  stop  voicing:  If  the  means  of  the  measured  durations  (voiceless 


21 


longer  and  voiced  shorter)  represent  the  feature  in  the  prototype,  the  two 
prototypes  should  not  be  terminal  poles.  The  values  of  about  half  the  tokens 
would  be  undefined  because  they  occur  outside  the  interval  bounded  by  the 
two  means. 

Consider  also  the  problem  of  applying  the  FLMP  to  the  experiment 
design  of  Miller  (1994).  In  one  of  several  experiments  investigating  internal 
category  structure,  she  constructed  a /bi/~/pi/  series  with  varying  VOTs,  and 
continued  the  series  into  super-long  VOTs,  which  she  designates  */pi/.  She 
asked  the  subjects  to  rate  how  well  the  tokens  represented  the  category  /pi/, 
and  derived  a "goodness  function"  from  the  responses.  Figure  2.2  shows  that 
the  function  peaked  at  around  60-70  ms,  which  is  apparently  the  value  of  the 


Figure  2.2.  Goodness  Ratings  of  Stimuli  as  /pi/  as  a function  of  VOT.  From 
Miller,  1994.  See  text  for  discussion. 


22 


/pi/  prototype  in  these  conditions.  The  function  dropped  off  sharply  to  the 
left,  with  a minimum  at  10-20  ms,  presumably  at  or  near  the  /bi/  prototype 
value.  To  the  right  of  the  peak,  in  the  */pi/  range,  the  goodness  function 
shows  a fairly  gentle,  even  roll-off,  until  the  last  token,  with  a VOT  of  200  ms, 
is  rated  only  slightly  better  than  the  10-20  ms  /bi/s.  Clearly,  a */pi/  can  be  a 
poor  representative  of  the  /pi/  category.  The  problem  for  the  FLMP  is  that  it 
is  not  thereby  a good  representative  of  /bi/.  Thus,  the  use  of  a bounded  scale 
for  evaluation  is  both  a logical  and  practical  problem  for  speech  research. 

The  second  objection  also  addresses  the  evaluation  scale,  but  in  a 
different  way.  Note  that  the  fuzzy  logic  values  are  bilaterally  symmetric  about 
0.5,  the  ambiguity  point,  and  that  this  is,  at  least  qualitatively,  analogous  to 
the  situation  described  above  with  consonant  hold  duration;  there  are  two 
distributions  with  a "boundary"  located  centrally  between  the  two.  However, 
there  are  many  cues  for  which  the  presence  of  a signal  and  its  associated 
distribution  contrast  with  the  absence  of  a signal,  and  the  consequent  absence 
of  any  opposing  distribution.  For  instance,  aspiration  amplitude  is  a cue  for 
voiceless  stops  in  English  (Repp,  1979),  but  in  voiced  stops,  aspiration,  if  not 
the  burst  itself,  is  absent  or  negligible  (Klatt,  1975).  VOT  is  also  a cue  for 
voicing,  but  the  concept  of  VOT  is  meaningless  for  intervocalic  voiced 
stops.14  Even  typically  bilateral  cues  can  become  asymmetric,  as  consonant 
hold  duration  does  in  Spanish,  where  voiced  stops  weaken  into  fricatives 
under  certain  phonological  conditions  (Harris,  1969).  Thus,  the  obligatory 
bilateral  symmetry  of  the  evaluation  scale  is  problematic  for  speech  research. 

The  third  objection  concerns  the  nature  of  the  prototypes  in  memory. 

Of  them,  Massaro  states: 


14See  Klatt  (1975)  and  the  related  discussion  in  Chap.  5 for  an  alternate  view. 


23 


A prototype  is  a category  and  the  features  of  the  prototype  correspond 
to  the  ideal  values  that  an  exemplar  should  have  if  it  is  a member  of 
that  category.  The  exact  form  of  the  representation  of  these  properties 
is  not  known  and  may  never  be  known.  (Massaro,  1987,  p.  17) 

Regardless  of  its  form,  the  content  of  the  prototype  presumably  determines 

the  difference  between  data  and  information."  As  Massaro  uses  the  terms 

(Massaro,  1987,  p.  13-14),  the  former  has  ecological  validity  and  is  available  for 

use  by  the  perceiver,  while  the  latter  has  "functional  validity"  and  is  actually 

used  by  the  perceiver.15  Though  he  values  this  distinction,  Massaro  admits  a 

certain  disinterest  in  classifying  stimulus  characteristics  as  one  or  the  other: 

Our  work  might  be  characterized  by  a primary  concern  for  how 
information  is  evaluated  and  integrated  (without  a detailed  assessment 
of  what  is  actually  informative).  (Massaro,  1987,  p.  14) 

From  a linguistic  viewpoint,  the  lack  of  an  explicit  method  for  describing 

prototypes  is  a serious  shortcoming.  Since  different  phonologies  would 

clearly  result  in  different  perceptual  prototypes,  the  lack  of  any  description  of 

prototypes  renders  impossible  such  characteristic  linguistic  tasks  as  the  cross- 

linguistic  comparison  of  different  prototypes  or  of  their  features.  Thus,  the 

lack  of  prototype  descriptions  is  a problem  for  linguistic  applications.16 

Taken  in  perspective  against  the  whole  framework  of  FLMP,  these 

three  problems  (an  evaluation  scale  that  is  bounded  and  that  is  bilaterally 

symmetric,  and  prototypes  that  are  undescribed)  are  fairly  minor,  and  they 

may  well  be  surmountable  even  for  the  types  of  linguistic  and  phonetic 

research  used  above  to  exemplify  them.  However,  a literature  search  seems 

to  indicate  that  Massaro  and  colleagues  are  extending  the  current  form  of  the 

15Massaro  attributes  the  distinction  between  ecological  and  functional  validity 
to  Egon  Brunswik  (see  especially  Brunswik,  1955a,  1955b). 

16The  concept  of  prototypes  is  being  studied  and  applied  by  other 
psychologists  (e.g.,  Rosch,  1977,  1978;  Nosofsky,  1988)  and  speech  researchers 
(e.g.,  Kuhl,  1992;  Sussman  & Lauckner-Morano,  1995),  but  this  work  has  not 
been  integrated  into  the  FLMP. 


24 


FLMP  to  other  problems  in  speech  and  to  other  domains  rather  than  refining 
its  application.17  In  any  case,  the  existence  of  one  such  model  hardly 
precludes  the  exploration  of  others.  Two  are  presented  by  Terry  Nearey. 

Nearey  s NAPP  (Nearey  & Hogan,  1986)  uses  surveyed  distributions  of 
speech  data  to  provide  probability  functions,  which  are  used  to  classify 
possible  future  productions.  The  functions  are  then  optimized  by  comparison 
with  identification  data  from  perceptual  tests.  He  compares  the  NAPP  to  a 
Thurstonian  model  (Thurstone,  1927),  considered  classic  within 
psychophysics.  The  two  differ  crucially  in  that  the  NAPP  is  ’’primed"  with 
means  and  standard  deviations  of  the  populations  to  be  perceived,  while  the 
Thurstonian  model  is  primed  with  the  borders  between  the  populations.  The 
comparison  leads  to  fairly  equivocal  results,  so  Nearey  justifies  a preference 
for  the  NAPP  model  partly  because  of  the  positivistic  nature  of  the  priming 
parameters,  and  partly  because  of  its  applicability  to  multivariate, 
multicategorical  choices. 

Nearey  and  Hogan  also  mention  that  the  NAPP  model  is  related  to  a 
family  of  log-linear  models  (henceforth  "LLMs"),  and  it  is  upon  these  that 
Nearey  elaborates  (1990,  1991,  1992),  using  data  from  Whalen  (1989).  Part  of 
Whalen  s study  tests  the  interdependence  of  consonant  and  vowel  choice  by 
varying  fricative  pole  frequency  and  F2  frequency  in  stimuli  presented  for 
identification  as  see.  Sue,  she,  shoe'  (i.e.,  /si,  su,  |i,  /u/).  Nearey  develops 
equations  to  model  the  identification  responses  of  Whalen's  listeners,  and 
through  analysis  of  the  terms  included  in  the  model  and  the  resulting 
configurations  of  the  response  spaces,  he  presents  arguments  for  the  segment 
as  the  basic  unit  of  speech  perception  (1990).  He  then  presents  a hypothesis 

17Indeed,  Massaro  (1992),  entitled  "Broadening  the  Domain  of  the  Fuzzy 
Logical  Model  of  Perception",  includes  an  inventory  of  articles  on  such  topics 
as  reading,  visual  depth  perception,  and  sound  localization. 


25 


regarding  a mechanism  for  sound  change  by  hyper-  and  hypocorrection,  as 
suggested  by  Ohala  (1981, 1992). 

Both  Nearey's  models  have  two  advantages  over  Massaro's  FLMP: 
First,  their  scales  are  the  acoustic  scales  upon  which  the  cues  are  measured, 
and  are  therefore  not  artificially  bounded  by  the  (0,1)  endpoints  of  fuzzy  logic. 
Second,  thanks  partly  to  the  acoustic  scales,  they  are  capable  of  representing 
claims  about  the  prototypical  values  of  the  cues  in  memory.  Indeed,  the 
NAPP  directly  uses  the  mean  and  standard  deviation  as  the  prototype. 
Nearey's  discussions  of  the  analogy  between  NAPP  and  LLMs  do  not  address 
how  the  production  distributions  might  be  represented  in  the  log-linear 
equations. 

Nearey's  models  nonetheless  present  drawbacks  and  unanswered 
questions,  at  least  as  currently  presented.  First,  as  with  the  FLMP,  it  is  not 
clear  how  either  the  NAPP  or  LLM  models  would  represent  "single  category" 
cues,  such  as  stop  burst  amplitude,  intervocalic  VOT,  or  Spanish  stop 
durations,  since  Nearey's  explanations  have  presented  only  multicategorial 
oppositions.  Second,  both  models  use  previously  collected  identification 
curves  to  predict  or  refine  the  identification  predictions  of  the  model. 
Certainly,  some  of  the  cues  of  the  Lisker  List  (and  presumably  elsewhere)  are 
sufficiently  weak  that  they  are  unable  to  effect  a change  of  percept  on  their 
own.  Are  the  models  nonetheless  able  to  represent  the  information 
contributed  by  these  cues?  Third,  Nearey's  analyses  of  LLMs  rely  heavily  on 
bivariate  territorial  maps.  While  a factorial  experiment  could  easily  be 
designed  to  test  a bivariate  (or  multivariate)  model,  the  use  of  single  cue 
designs,  whether  as  tests  or  extensions  of  the  models,  requires  discussion. 
Finally,  and  crucially  for  linguistic  considerations,  consider  again  the  problem 
of  cross-linguistic  comparisons.  The  NAPP  is  clearly  capable  of  modeling 


26 


different  languages  phonetic  patterns  by  reference  to  the  production 
distributions  and  by  optimization  to  known  identification  curves.  However, 
Nearey  never  discusses  whether  any  measure  directly  and  uniquely 
representing  the  perceptual  importance  of  a given  cue  is  derivable.  Such  a 
measure  would  have  to  be  independent  of  both  the  production  distributions 
and  any  lexical  bias  effects.  The  problem  is  of  course  worse  for  the  LLMs, 
where  the  ruling  territorial  analogy  leaves  the  representation  of  the 
production  distributions  unclear  as  well. 

Though  these  characteristics  of  the  NAPP  and  LLM  are  current 
drawbacks  from  our  point  of  view,  they  cannot  be  considered  fatal  flaws  with 
either  model.  Both  models  are  fairly  new,  and  they  and  their  interpretations 
are  still  being  explored,  and  none  of  the  problems  outlined  here  require 
resolution  in  the  context  of  the  problems  Nearey  has  thus  far  addressed.  Still, 
as  with  Massaro  s FLMP,  the  existence  of  these  shortcomings  invites 
exploration  via  other  frameworks,  whose  results,  like  Whalen's,  might  even 
be  incorporable  into  one  of  the  three  models:  FLMP,  NAPP,  or  LLM 

2.6  General  Critique  of  the  Literature 

A reasonable  summary  of  the  content  of  the  literature  and  critiques 
above  might  be  formulated  as  follows:  The  acoustic  information  used  for 
phonemic  perception  is  encoded  in  multiple  cues,  and  listeners  integrate 
information  from  more  than  one  cue  during  perception.  These  cues  are  at 
least  partially  redundant,  since  trading  relations  allow  listeners  to  maintain 
the  same  percept  in  the  face  of  "strengthening  and  weakening"  balanced 
across  pairs  of  cues.  Languages  differ  (at  least)  with  regard  to  what  they 
exhibit  as  the  typical  setting  of  (at  least)  some  cues.  Over  and  above  any  effect 
of  a cue  s setting  in  a given  token,  some  cues  are  more  perceptually  important 


27 


than  others.  Progress  has  been  made,  but  much  remains  to  be  done  in  the 
effort  to  develop  quantitative  models  of  speech  perception  and  the 
representation  of  cues  in  such  a model. 

The  literature  does  not  seem  to  include  any  technique  directly 
applicable  to  the  cross-language  comparison  of  perceptual  use  of  a given  cue. 
Standard  survey  designs  can  easily  reveal  cross-language  differences  in 
acoustic-phonetic  structure,  but  neither  difference  nor  similarity  proves 
anything  concerning  the  perceptual  use  of  that  structure.  (This  claim  merely 
recognizes  that  Massaro  s distinction  between  ecological  and  functional 
validity  applies  to  cues.)  Rankings  may  reveal  cross-language  differences  if 
the  differences  are  sufficiently  great,  but  their  failure  to  reveal  differences  is 
poor  evidence  of  similarity.  Moreover,  they  cannot  quantify  levels  of 
difference,  since  they  are  ordinal  rather  than  interval  or  ratio  measures  (cf. 
Stevens,  1946).  As  for  trading  relations  , their  main  utility  may  be  to  prove 
that  the  cues  traded  are  indeed  used  in  the  perception  of  the  contrast 
investigated.  (This  may  be  why  researchers  often  decide  it  is  unnecessary  to 
take  the  trouble  to  quantify  them  exactly.)  It  is,  however,  not  clear  how 
trading  relations  depend  either  on  acoustic  structures,  which  are  known  to 
differ  across  languages,  or  on  perceptual  use,  which  may  or  may  not  differ 
across  languages.  Finally,  the  quantitative  perceptual  models  (Massaro's 
FLMP  and  Nearey's  NAPP  and  log-linear  models)  may  be  capable  of 
rendering  a measure  of  perceptual  use  allowing  cross-language  comparisons. 
Thus  far,  however,  the  proponents  have  not  explicated  the  models  with  that 
goal,  so  the  matter  remains  undetermined. 

There  are  other  cue-centered  weaknesses  in  much  of  the  literature, 
which  are  unrelated  to  the  question  of  cross-language  comparisons.  For 
instance,  despite  the  common  recognition  that  multiple  cues  contribute  to 


28 


perception  of  a single  contrast,  phoneme  boundaries  are  often  experimentally 
investigated  by  creating  a stimulus  series  varying  a single  cue  in  an  otherwise 
identical  (synthetic  or  natural)  prototype.  If  the  prototype  happens  to  fix  one 
or  more  non-experimental  cues  at  extreme  values,  the  resulting  identification 
and  discrimination  curves  will  be  descriptively  inaccurate  to  the  extent  that 
the  experimental  cue  has  been  inadvertently  traded  with  the  prototype's  fixed 
cues. 

Consider  also  the  difficulty  of  developing  a complete  description  of  any 
contrast  via  perceptual  experiments  extending  an  existing  line  of  research  and 
integrating  with  its  previous  results.  Since  rankings  and  trading  relations 
specify  a cue  relative  to  other  cues,  each  new  cue  to  be  added  to  the 
description  would  require  remanipulating  and  reinvestigating  at  least  one, 
and  probably  many  other,  cues.  This  redundant  work  would  be  avoided  if 
there  were  a paradigm  whereby  cues  could  be  investigated  one  at  a time, 
autonomously,  with  the  results  then  being  integrated  in  the  overall 
description  of  the  contrast.  Massaro’s  and  Nearey's  models  seem  amenable  to 
such  autonomous  measures,  but  neither  has  discussed  the  requisite 
descriptive  program,  and  so  has  not  specified  how  it  might  proceed. 

In  sum,  there  are  several  aspects  of  acoustic  speech  cue  perception 
which  are  unaddressed  or  poorly  addressed  in  the  literature.  The 
quantification  of  cue  importance  and  its  comparison  across  languages  are 
certainly  two  of  those  aspects. 


CHAPTER  3 

APPLYING  SIGNAL  DETECTION  THEORY  TO  CUES 


The  measure  for  quantifying  cue  importance  which  is  explored  in  this 
research  is  essentially  a d'  sensitivity  measure  from  signal  detection  theory. 
The  appropriate  application  of  signal  detection  techniques  warrants 
explanation,  so  that  the  resulting  d'  can  be  seen  as  a valid  cross-language 
measure  of  cue  sensitivity.1 

3.1  The  Basics  of  Signal  Detection 


Signal  detection  theory  (SDT)  was  formulated  in  sensory  perception 
research  in  the  1950s  and  1960s,  and  gained  credence  when  SDT  was  able  to 
furnish  measurement  techniques  superior  to  perceptual  thresholds,  which 
had  been  used  up  until  then  and  had  finally  been  found  inadequate. 
Threshold  theory  held  that  the  performance  of  a sensory  transducer  was 
revealed  by  the  intensity  of  the  faintest  stimulus  reported  as  perceivable  by 
the  subject,  but  the  theory  failed  when  that  stimulus  intensity  was  shown  to 

1The  discussion  in  this  chapter  is  not,  and  would  be  inadequate  as,  an 
introduction  to  or  outline  of  signal  detection  theory.  Rather,  it  is  a guide  to 
those  concepts  of  SDT  used  in  this  study,  and  a justification  of  their 
application  here.  As  such  it  necessarily  presumes  some  familiarity  with  SDT 
or  other  psychophysical  scaling  techniques.  Those  interested  in  an  exposition 
of  SDT  might  consider  the  following:  The  landmark  and  still  definitive 
exposition  of  the  theory  and  its  techniques  is  Green  and  Swets  (1966).  For  a 
thorough  pedagogical  exposition  see  McNicol  (1972),  or  for  more  recent 
refinements  and  extensions,  McMillan  and  Creelman  (1991).  Baird  and 
Noma  (1978)  have  chapter  length  introductions  to  signal  detection  and  other 
psychophysical  models  and  theories,  and  brief  introductions  can  be  found  in 
any  textbook  on  sensation,  perception,  or  psychophysics. 


29 


30 


be  affected  by  a number  of  factors  unrelated  to  either  the  sense  or  the 
stimulus,  specifically  factors  related  to  the  subject's  motivation  and  attitude. 

SDT  provided  a way  to  separate  these  aspects  of  a subject's  perceptual 
performance,  the  exclusively  sensory  function  of  the  subject  as  a transducer, 
and  the  more  complex  intentional  aspects  of  the  subject's  inclination  to 
respond.  The  former  was  termed  sensitivity  and  measured  by  d'  (d-prime), 
the  latter  was  termed  the  criterion  and  measured  by  p (beta).  These  could  be 
manipulated  separately  through  experimental  design,  the  first  through 
characteristics  of  the  stimulus  and  its  presentation,  the  second  in  various 
ways,  including  giving  different  rewards  and  punishments  to  the  subjects  for 
various  kinds  of  performance. 

The  fundamental  advance  of  SDT  was  the  recognition  that  repeated 
presentation  of  an  identical  stimulus  provokes  variable,  not  identical,  sensory 
transduction  responses.  This  allowed  modeling  of  sensory  and  perceptual 
performance  with  an  array  of  already  well  known  statistical  techniques. 

Thus,  repeated  presentation  of  a given  stimulus  gave  rise  to  a distribution  of 
sensory  responses,  with  (on  some  abstract,  possibly  neural  scale)  a central 
tendency  and  a level  of  variation.  Discrimination  of  a given  pair  of  similar 
stimuli  depended  directly  on  the  amount  of  overlap  of  the  sensory  response 
distributions,  and  only  indirectly  on  the  stimuli  themselves.  Similarly, 
detection  of  a faint  stimulus  depended  on  the  overlap  between  its  sensory 
response  distribution  and  that  of  "background  noise"  due  to,  among  other 
things,  random  neuron  firing  rates. 

If  two  stimuli  provoke  response  distributions  which  are  far  apart,  with 
little  overlap,  then  that  sensory  system  shows  high  sensitivity  to  the 
difference  between  those  two  stimuli,  and  the  subject  distinguishes  them 
easily.  If  they  provoke  response  distributions  which  are  close  together,  with  a 


31 


large  amount  of  overlap,  then  that  sensory  system  shows  low  sensitivity  to 
the  difference  between  them,  and  the  subject  distinguishes  them  poorly. 
Conceived  of  this  way,  sensitivity  can  be  quantified  by  measuring  separation 
between  the  two  distributions,  as  shown  in  Fig.  3.1: 


Higher 

Sensitivity 


Lower 

Sensitivity 


Figure  3.1.  Schematic  representation  of  sensitivity  as  the  separation  of  two 
distributions.  Signal  detection  theory  models  sensitivity  as  d',  and  quantifies 
it  by  measuring  the  distance  between  the  two  means. 


As  a global  result  of  sensory  function,  including  both  transduction  and 
cognition,  such  distributions  and  their  separation  would  be  difficult  or 
impossible  to  measure  directly.  Fortunately,  simplifying  assumptions  from 
statistics  allow  such  difficulties  to  be  dispensed  with.  Typically,  in  the  absence 
of  indications  otherwise,  distributions  are  assumed  to  be  normal,  and 
variances  of  distributions  are  assumed  to  be  equal. 2 With  these  assumptions, 
sensitivity  is  measurable  as  the  distance  between  the  means  of  the  two 
distributions  in  units  of  standard  deviation.  This  measurement  is  d'. 

Note  that  although  the  stimuli  are  specified  on  some  physical  scale, 
such  as  hertz  or  decibels,  sensitivity  to  the  stimuli  is  measured  on  an  abstract 
scale.  Sensitivities  to  different  kinds  of  stimuli  (e.g.,  weight  and  frequency) 

2The  validity  of  such  assumptions  is  testable,  and  other  assumptions  are 
possible.  See  Green  and  Swets  (1966). 


32 


are  thus  theoretically  capable  of  comparison,  even  though  completely 
different  senses  may  be  involved  in  their  perception.  In  the  absence  of  SDT 
(or  some  other  perceptual  link)  there  would  not  be  any  clear  way  of  relating 
the  two  scales,  since  they  have  no  absolute  threshold,  nor  any  nonarbitrary 
step  size. 


3.2  Applying  the  Mechanics  of  SDT 

The  mechanics  of  SDT  requires  the  manipulation  of  (in  the  simplest 
experiments)  two  error  rates,  determined  when  two  different  stimuli  are 
presented  and  the  subjects  misperceive  some  proportion  of  each  stimulus  as 
the  other.  Since  the  archetypal  SDT  experiment  concerns  detection  of  a signal 
in  noise,  the  two  stimulus  types  are  traditionally  termed  "signal"  and  "noise." 
The  errors  are  termed  misses  when  a signal  is  present  but  not  perceived, 
and  "false  alarms"  when  no  signal  is  present  but  one  is  mistakenly  perceived. 
Correct  responses  are  termed  "hits"  and  "correct  rejections"  respectively.  (In 
studies  where  the  2 stimuli  are  both  signals,  one  is  arbitrarily  assigned  the 
noise"  designation  in  order  to  maintain  terminological  consistency.)  These 
alternatives  are  shown  in  Table  3.1. 

Table  3.1. 

Response  Classifications  in  Signal  Detection  Theory 


Stimulus 

Response 

Signal 

Noise 

Signal 

Hit 

Miss 

Noise 

False 

j Correct 

Alarm 

j Rejection 

33 


Subjects  may  be  required  to  do  two  different  types  of  tasks.  In 
identification  tasks,  the  subjects  are  presented  the  stimuli  one  at  a time  and 
asked  to  classify  each  stimulus  as  either  signal  or  noise.  Discrimination  tasks 
may  follow  several  different  patterns,  but  in  each,  the  stimuli  are  presented  in 
sets  of  2 or  3.  If  presented  in  pairs,  the  subject  may  be  asked  whether  the  pair 
are  the  same  or  different,  or  which  of  the  pair  contains  the  signal.  If 
presented  in  sets  of  3,  in  the  pattern  ABX  or  AXB,  the  subject  may  be  asked 
whether  the  X stimulus  is  like  the  A or  the  B stimulus.  The  choice  of  which 
task  to  use  is  based  on  practical  concerns,  since  the  necessary  rates  can  be 
calculated  for  all  of  them. 

The  stimuli  in  SDT  experiments  must  be  designed  and  presented  with 
extreme  attention  to  force  the  subject  to  rely  on  exactly  the  aspect  of  the 
stimulus  being  investigated.  For  instance,  in  an  investigation  of  the 
discrimination  of  acoustic  spectrum  configurations,  the  signal  might  be  a 
small  relative  increment  in  the  amplitude  of  a component  at  some  particular 
frequency.  That  increment,  besides  changing  the  spectral  configuration,  also 
causes  a slight  increase  in  the  overall  amplitude  of  the  sound.  In  order  to 
force  the  subject  to  judge  on  spectral  qualities,  the  overall  amplitude  of  the 
stimuli  are  varied  randomly,  so  that  the  signal  tokens  are  sometimes  louder, 
sometimes  softer  than  the  noise  tokens.  This  way,  the  only  useful 
information  available  to  the  subject  is  that  of  the  experimental  variable, 
spectrum  configuration. 

The  stimuli  (or  stimulus  sets)  are  presented  in  intermingled, 
unpredictable  order,  so  that  the  subject,  faced  with  uniform  uncertainty,  will 
tend  to  maintain  the  same  perceptual  process  throughout  the  test  session. 

The  same  perceptual  process  thus  splits  the  signal  stimuli  into 


34 


complimentary  proportions  of  hits  and  misses,  and  the  noise  stimuli  into 
complimentary  proportions  of  false  alarms  and  correct  rejections. 

If  we  model  that  perceptual  process  as  one  of  setting  a criterion  on  an 
internal  perceptual  scale  appropriate  for  the  stimulus  type  (Fig.  3.2  step  1), 
then  we  can  see  that  the  subject's  "yes"  responses  fall  on  one  side  of  the 


Figure  3.2.  The  subject's  and  experimenter's  views  of  the  signal  detection 
experiment. 

criterion  and  the  "no"  responses  on  the  other  (Fig.  3.2  step  2).  The  researcher, 
knowing  whether  the  stimulus  was  really  a signal  or  really  noise,  can  separate 
the  yesses  into  hits  and  false  alarms  and  the  noes  into  misses  and  correct 


35 


rejections.  Then,  with  the  assumption  of  normally  distributed  sensory 
response  and  with  the  complementary  response  proportions  for  the  two 
stimulus  types,  the  researcher  can  determine  the  z-score  location  of  the 
criterion  within  the  distribution  of  each  stimulus  type,  noise  and  signal, 
simply  by  looking  up  the  proportion  in  the  normal  curve  tables. 

Since  the  criterion  is  actually  a single  reference  point  on  a single  scale, 
locating  it  in  both  distributions  has  the  effect  of  situating  the  two  distributions 
relative  to  one  another  (Fig.  3.2  step  3).  As  explained  above,  their  separation 
determines  the  subject’s  sensitivity  to  the  difference  between  the  two  stimuli. 
Knowing  the  criterion's  z-score  in  both  distributions  allows  the  researcher  to 
calculate  the  distance  between  the  two  means  in  z-score  (i.e.,  standard 
deviation)  units  (Fig.  3.2  step  4).  This  calculation  is  the  crucial  step  allowing 
perceptual  importance  to  be  judged  on  an  abstract  rather  than  acoustic  scale, 
and  the  resulting  distance  is  d'. 

3.3  Structure  of  Cues  in  Speech 

The  application  of  SDT  techniques  to  cues  requires  attention  to 
techniques  for  specifying  a cue's  distribution  and  to  the  patterns  of  cues  in 
phonological  contrasts. 

As  a product  of  a biological  system,  cues  are  produced  with  imperfect 
accuracy,  and  since  they  are  produced  by  all  the  talkers  in  a language 
community,  individual  variation  in  cues  must  be  expected.  The  redundancy 
of  multiple  cues  may  aid  in  overcoming  the  errors  in  perception  that  could 
(and  sometimes  do)  result  from  this  variability.  Nonetheless,  as  elements  in 
a conventionalized  signaling  system,  cues  presumably  have  a target  measure 
that  is  intended  by  talkers  and  expected  by  listeners.  Target  and  accuracy  can 


36 


theoretically  be  described  by  indices  of  central  tendency  and  dispersion  from 
descriptive  statistics. 

However,  it  is  not  possible  to  know  a priori  whether  a cue's  target  is  an 
absolute  measure  or  a proportion,  whether  it  is  specified  on  a physical  or 
psychophysical  scale,  or  what  physiological  and  psychological  processes 
contribute  to  its  variability.  It  is  therefore  unrealistic  to  expect  any  specific 
acoustic  measurement  to  reveal  a strictly  normal  distribution.  The  normal 
distribution  can  still  be  used  by  default  as  a neutral  simplifying  assumption, 
in  the  same  manner  as  it  is  used  in  SDT.  Later  research  can  address  the 
question  of  which  psychophysical  or  mathematical  transforms  best  reveal  the 
underlying  form  of  the  target  and  the  limits  of  its  precision  and  can 
reevaluate  the  data  accordingly. 

There  seem  to  be  two  basic  contrast  patterns  whereby  cues  signal  the 
phonemes  (or  features)  they  identify.  First,  the  same  cue  may  exist  in  both 
members  of  an  opposition,  but  with  the  distribution  of  the  cue  in  the  two 


Figure  3.3.  Bimodal  and  monomodal  cue  contrasts.  This  schematic  example 
illustrates  the  structural  difference  hypothesized  between  the  two  contrast 
types. 


37 


members  centered  on  different  ranges.3  Stop  consonant  duration  is  of  this 
type,  voiced  stops  are  short  and  voiceless  stops  are  long.  Second,  a cue  may 
exist  uniquely  in  one  member  of  an  opposition,  and  its  distribution  is 
opposed  in  a sense  to  the  absence  of  a distribution  in  the  opposing  member. 
English  accented  stop  burst  amplitude  is  of  this  type:  voiceless  stops  have  a 
burst,  but  voiced  stops  have  none,  or  a vestigial  one  at  best.  These  two 
patterns  can  be  termed  bimodal  (potentially,  multimodal)  and  monomodal.4 

Thus,  a cue  has  a characteristic  (and  measurable)  distribution,  and  it 
may  be  opposed  either  to  another  distribution  or  to  none.  Both  of  these 
tenets,  but  especially  the  monomodal  opposition  pattern,  seem  to  indicate 
that  so-called  phoneme  boundaries  are  a mere  epiphenomenon,  derived 
from  the  positivistic  fact  of  the  distributions.  Under  this  conception,  cues  in 
speech  tokens  are  perceived  not  based  on  absolute  measurements,  nor  on 
position  relative  to  some  boundary,  but  according  to  their  assimilability  to  an 
appropriate  distribution  type  in  the  language.  Cue  description  and 
experimentation  that  fails  to  take  this  into  account  is  necessarily  flawed  or 
incomplete. 


3.4  Standardizing  Signal  Strength  Across  Languages 

For  typical  sensory  experiments,  the  stimuli  are  fully  specifiable  on 
physical  scales,  such  as  temperature,  weight,  or  length.  Stimuli  along  these 
scales  are  perceived  continuously,  meaning  that  (within  most  of  the 
perceivable  range)  no  particular  regions  on  the  scale  are  privileged,  and 


3This  pattern  is  clearly  generalizable  to  cues  in  a phonemic  series  with  more 
than  two  members. 

4This  terminology  is  intended  to  avoid  terms  with  previous  phonological 
usage,  such  as  the  protean  "marked /unmarked,"  or  "bilateral"  and 
privative,  which  might  have  served  well  here  had  not  Trubetzkoy 
(1939/1969)  used  them  with  different  meaning. 


38 


proportional  differences  can  be  judged  anywhere  along  the  scale.5  The 
stimuli  mean  the  same  thing  (that  is,  they  provoke  the  same  sensations)  for 
all  (nonpathological)  humans,  and  indeed,  for  many  nonhuman  animals, 
since  the  sensation  results  from  identical  or  analogous  biophysical 
transduction  mechanisms  and  neurological  processing. 

The  situation  for  cues  is  clearly  quite  different.  First,  naturally 
produced  cues  clearly  occupy  well  defined  distributions  on  their  scales,  with  a 
describable  central  tendency  and  variability.  Second,  as  is  known  from  the 
extensive  literature  on  categorical  perception  (see  especially  Harnad,  1987), 
proportional  differences  typically  cannot  be  judged  along  the  scale,  but  rather, 
different  stimuli  with  cues  in  the  same  "natural"  distributions  cannot  be 
easily  distinguished.  Third,  as  was  shown  in  the  literature  review  above,  the 
distributions  are  defined  by  linguistic  experience  rather  than  shared  biology  or 
neurology,  so  stimuli  may  not  provoke  the  same  perception  for  all  people;  a 
stimulus  may  lie  on  one  side  of  a phoneme  boundary"  in  one  language  and 
on  the  opposite  side  in  another.  Given  this  situation,  what  can  be  used  as  a 
basis  for  comparison  of  cues  across  languages? 

The  answer  lies  in  recognizing  analogous  expectations  across 
languages.  The  expected  measure  and  the  expected  precision  of  cues  may  be 
considered  to  be  perceptually  equivalent  across  languages,  regardless  of  what 
their  surveyed  settings  may  be.  For  instance,  modeling  those  expectations 
with  standard  distributions,  two  tokens  with  the  mean  voiceless  stop 

5In  some  cases,  this  is  oversimplification,  such  as  in  color  perception  in  cone 
receptors  via  3 chemicals  with  different  transduction  characteristics,  or  in 
auditory  direction  finding,  where  the  low-frequency  phase  difference 
mechanism  gives  way  to  an  intensity  difference  mechanism  at  frequencies 
high  enough  so  that  the  wavelength  is  shorter  than  the  interaural  distance. 

In  these  cases,  the  discontinuities  have  a biophysical  explanation,  and 
psychophysical  scales  are  specifiable  which  appropriately  redefine  the 
continuity  and  proportionality  of  sensation. 


39 


consonant  duration  may  be  considered  equivalent  in  their  respective 
languages,  even  though  those  languages  may  set  the  means  at,  for  example, 
110  ms  vs.  130  ms.  This  conceptualization  of  cross-language  equivalencies 
may  seem  incompatible  with  a uniformitarian  treatment  of  speech  perception 
or  the  search  for  phonetic  invariance,  but  from  a linguistic  point  of  view,  it  is 
simply  an  application  of  the  idea  of  conventionality  to  speech  cues.  The 
language  community  and  environment  determine  the  typical  settings  of  cues 
and  of  the  precision  required  in  their  pronunciation.  These  settings  are  the 
phonetic  conventions  that  the  community  uses  for  the  cue  in  question. 


Language 

Q 98  110  122 

R 106  130  154 

Figure  3.4.  Equivalent  cue  settings  in  two  hypothetical  languages.  Settings 
illustrate  voiceless  stop  consonant  durations  at  the  mean  and  at  3 standard 
deviations  above  and  below. 

The  importance  of  this  conception  is  that  it  allows  calculation  of 
equivalent  signal  strengths  for  the  same  cue  in  different  languages.  Tokens 
with  the  mean  voiceless  stop  consonant  duration  may  be  considered 
equivalent  for  that  cue  in  two  languages,  even  though  set  at  110  ms  and  130 
ms,  as  in  the  example  above.  Similarly,  tokens  3 standard  deviations  above 
the  mean  may  be  considered  equivalent,  though  the  standard  deviations  may 


40 


be  4 ms  and  8 ms,  and  the  tokens  122  ms  and  154  ms,  respectively. 
Furthermore,  shortening  closures  by  5 standard  deviations  may  be  considered 
an  equivalent  change,  though  the  changes  be  20  ms  and  40  ms  respectively .6 

Thus,  by  an  appropriate  conception  of  the  effect  of  language  differences 
on  cue  settings,  it  is  possible  to  make  reasonable,  testable  cross-language 
analogies  regarding  cues.  These  analogies  for  the  first  time  give  the 
experimenter  valid  control  over  signal  strength,  since  they  allow  presentation 
of  acoustically  different,  yet  linguistically  appropriate  and  cue-equivalent 
stimuli  to  speakers  of  different  languages. 

3.5  Testing  Cue  Strengths  Across  Languages 

With  linguistically  valid  control  of  signal  strength,  as  described  above, 
it  is  possible  to  imagine  linguistically  valid  single  language  cue  strength 
measures  and  cross-language  cue  comparisons  for  the  first  time. 

Consider  voiceless  stop  consonant  duration  in  the  two  hypothetical 
languages  outlined  above,  "Q"  with  a mean  of  110  ms  and  standard  deviation 
of  4 ms,  and  "R"  with  a mean  of  130  ms  and  standard  deviation  of  8 ms.  One 
could  easily  design  a modified  replication  of  Lisker  (1957)/  for  each  of  these 
languages,  and  use  a stimulus  series  anchored  to  the  mean,  with  successive 
stimuli  separated  by  (multiples  of)  the  standard  deviation,  instead  of  the 
arbitrary  step  size  of  10  ms. 


6The  terminology  of  descriptive  statistics  uses  "s"  for  standard  deviation  of  a 
sample,  and  "z"  or  ”z-score"  for  a location  on  the  standard  normal  curve,  as 
measured  in  multiples  of  s from  the  mean.  Thus,  a cue  token  1.5  s below  the 
mean  would  have  a z-score  of  -1.5.  One  could  speak  of  a cue  at  z=3.5  being 
edited  by  s=-4  to  arrive  at  z=-0.5. 

7Recall  from  the  literature  review  that  this  experiment  varied  the  duration  of 
the  bilabial  stops  in  "rupee"  and  "ruby",  and  tested  the  resulting  perception. 


41 


Lisker  used  stimuli  with  closures  of  (40,  50, , 140,  150)  ms,8  and  the 
results  he  reported  focused  attention  on  the  perceptual  crossover  between  /p/ 
and  /b / percepts.  The  replication  could  present  Q speakers  with  closures  of 
(70,  74, ... , 106, 110)  ms  and  R speakers  with  closures  of  (50,  58, ... , 122, 130) 
ms.  These  are  equivalent  stimulus  sets,  constructed  such  that  the  stimuli  are 
10  standard  deviations  shorter  than  the  mean,  then  9,  continuing  until  the 
last  is  the  mean  itself.  The  results  would  focus  not  on  the  50%  crossover 
(which  may  or  may  not  be  reached),  but  on  differences  in  perceptual  effects  of 
equivalent  stimuli.  If  at  6 standard  deviations  below  the  mean  (86  ms  for  Q 
and  82  ms  for  R),  Q speakers  still  have  a 94%  probability  of  a voiceless  percept, 
but  R speakers  only  a 70%  probability,  it  would  seem  that  R speakers  rely 
more  heavily  than  Q speakers  on  closure  duration  for  perception  of  voiceless 
stops.  Q speakers  rely  on  other  cues  instead. 

Of  course,  with  the  stimulus  series  as  described,  the  differences  can  be 
compared  anywhere  along  a function  anchored  at  the  mean,  and  stretching 
toward  the  value  of  the  competing  percept  (another  mean  in  the  case  of  a 
bimodal  cue  opposition,  or  perhaps  a zero  for  a monomodal  cue).  The  faster 
this  function  drops  away  from  the  mean,  the  more  the  language  relies  on  the 
information  in  that  cue.  Since  perceptual  crossover  may  not  be  reached, 
different  methods  must  be  found  for  describing,  and  comparing  these 
functions,  but  these  methods  may  be  developed  along  with  the  different 
experiment  designs  which  can  sample  the  function.  This  lacuna  should  not 
detract  from  the  basic  idea  that: 


8He  had  surveyed  a small  corpus,  and  found  the  closure  of  "ruby"  to  have  a 
mean  of  75  ms  and  a range  of  65-90  ms,  and  "rupee"  a mean  of  120  ms  and  a 
range  of  90-140  ms. 


42 


Cue  strength  can  be  described  and  measured  within  a language,  and 
compared  validly  across  languages,  via  a function  plotting  a token's  location 
relative  to  the  cue's  expected  distribution  versus  its  perceptual  effect  at  that 
location. 


3.6  The  Specific  Need  for  Signal  Detection  Theory 

Thus  far,  perceptual  effect  has  been  spoken  of  in  general  terms,  and  it 
has  been  reasonable  to  consider  it  as  manifested  in  simple  measures,  such  as 
proportions  of  responses  for  the  various  possible  responses.  Unfortunately, 
such  simple  measures  are  inadequate  for  at  least  two  reasons:  first,  because  of 
the  intrinsic  response  bias  in  certain  linguistic  situations,  and  second,  because 
of  the  intractable  lack  of  stimulus  control  in  the  ecologically  requisite 
experimental  design.  A third  reason  arises  from  the  protocol  used  in  this 
experiment.  The  sensitivity  measure  of  signal  detection's  d'  seems  to  be  the 
best  tool  for  simultaneously  addressing  these  problems.9 

Recall  that  one  original  motivation  for  SDT  was  its  ability  to  separate 
purely  sensory  from  more  attitudinal  aspects  of  perception  by  measuring 
them  separately  as  d'  and  {3.  One  of  the  attitudinal  aspects  thus  mastered  is 
expectation.  If  a subject  has  just  responded  "Noise"  to  a long  string  of  stimuli 
even  though  the  experimenter  said  that  signal  and  noise  trials  are  equally 
likely,  that  subject  will  be  increasingly  motivated  to  respond  "Signal"  to 
upcoming  stimuli,  regardless  of  the  actual  nature  of  the  stimulus.  This  is  one 
form  of  response  bias,  the  tendency  of  a subject  to  base  a response  on 
something  other  than  the  perceived  characteristics  of  the  stimulus.  Various 
experimental  designs  were  developed  to  control  and  manipulate  response 
bias,  including  reversing  stimulus  order  for  half  the  subjects,  changing  the 


9See  Macmillan  (1987)  for  other  arguments  in  favor  of  d'  in  speech  research. 


43 


proportion  of  signal  and  noise  tokens,  and  instituting  rewards  (or  penalties) 
for  different  kinds  of  responses.  While  these  all  can  change  the  response  bias, 
measured  as  £1,  and  thus  change  the  overall  probability  of  a given  response, 
they  have  no  effect  on  the  sensitivity,  measured  as  d'. 

Now  consider  the  question  of  response  bias  with  regard  to  research  on 
voicing  in  English  fricatives.  If  presented  with  the  appropriate  stimuli, 
subjects  should  have  no  difficulty  responding  with  either  member  of  the  / f- 
v/  and  /s-z/  oppositions.  Unfortunately  this  is  not  the  case  for  /J-3/.  The  /// 
phoneme  is  common  in  all  positions  (initial,  intervocalic,  and  final)  and  it  is 
found  in  some  clusters.  The  / 3/  on  the  other  hand  is  common  only 
intervocalically,  in  words  derived  historically  from  a palatalization  of  /z/ 
such  as  vision  and  measure."  It  occurs  in  final  position  only  in  words 
borrowed  (fairly)  recently  from  French,  such  as  "rouge"  and  "prestige."  It 
never  occurs  initially  (the  names  "Jacques"  and  "Zsazsa"  being  foreign)  or  in 
clusters  (unless  the  affricate  / d;/,  aka  /]/,  is  to  be  analyzed  as  a cluster).  Thus, 
there  is  good  reason  to  believe  that  ///  and  f$/  would  not  be  equally  available 
and  natural  to  subjects  as  responses  in  a perceptual  test.  A similar  situation 
may  exist  for  /0-S/,  since  /5/  is  extremely  common,  but  only  in  grammatical 
determiners,  such  as  the,  these,  their,  etc.  Of  course  the  most  extreme 
situation  exists  in  certain  other  languages,  where  a gap"  exists  in  a phonemic 
pattern:  Arabic  for  instance  has  contrasting  voiced  and  voiceless  stop  series, 
but  with  no  / p/  opposing  the  common  /b/.  It  would  be  extremely  difficult  to 
explain,  predict,  and  validly  measure  any  response  bias  due  to  such  lexical 
and  grammatical  influences.  The  obvious  solution  is  to  use  d',  the  sensitivity 
measure,  and  eliminate  such  bias  from  concern. 

The  second  problem,  that  of  stimulus  control,  has  been  dealt  with  in 
much  of  the  experimental  literature  in  one  of  two  ways  (cf.  section  2.6).  In 


44 


some  cases,  a naturally  produced  token  has  been  taken  as  a "prototype,"  and  a 
series  of  stimuli  have  been  produced  by  editing  different  values  of  the 
experimental  acoustic  variable  into  the  multiple  copies  of  the  prototype. 
Lisker  (1957)  is  an  example,  wherein  different  durations  of  silence  were 
spliced  into  the  intervocalic  hold  position  of  copies  of  a single  natural  token. 
In  other  cases,  a series  of  completely  synthetic  tokens  has  been  constructed, 
with  the  experimental  variable  set  to  different  values  and  all  other  values 
held  steady.  Lisker  and  Abramson  (1970)  is  an  example  using  a Haskins 
synthesizer  to  generate  a VOT  continuum.  Indeed,  the  Klatt  synthesizer 
(Klatt,  1980)  is  expressly  designed  for  ease  in  creation  of  such  series,  with 
perceptual  tests  as  a goal. 

Unfortunately,  despite  the  practical  attraction  of  these  methods,  their 
design  has  an  important  flaw.  Whether  the  prototype  is  natural  or  synthetic, 
the  nonexperimental  cues  are  artificially  set  to  some  particular  value.  Given 
that  cue  inventories  are  imperfectly  formulated  and  are  potentially  open  sets 
(see  Lisker,  1986),  it  is  impossible  to  know  whether  some  cue  was  not 
inadvertently  fixed  at  an  extreme  value.  Such  a situation  would  clearly  make 
it  impossible  to  rule  out  response  bias  in  a perceptual  test  relying  on  simple 
response  proportions.  This  argues  again  for  the  use  of  the  d'. 

Another  way  out  of  this  problem  would  be  to  give  up  the  technique  of 
generating  a series  out  of  single  prototypes.  Using  a large  number  of  natural 
tokens  modified  to  each  desired  setting,  and  using  each  token  only  once, 
would  in  principle  allow  the  nonexperimental  cues  to  vary  according  to  their 
own  natural  distributions.  This  design,  however,  could  require  an 
impracticable  number  of  tokens,  since  each  must  be  painstakingly  constructed 
for  each  setting  of  the  experimental  cue,  and  it  still  does  not  guarantee  that 
bias  effects  will  balance  out.  A safe  compromise  would  be  to  use  a smaller 


45 


number  of  single-use  natural  tokens  and  the  d sensitivity  measure  as  well. 
Any  bias  not  avoided  by  the  use  of  natural  tokens  would  be  eliminated  by  the 
use  of  d . This  is  the  solution  employed  in  this  research. 

The  third  problem  arises  from  the  nature  of  the  stimuli  in  the 
experimental  protocol  used  here.  The  framework  underlying  the  experiment 
presumes  that  listeners  have  a representation  in  memory  of  a cue's  expected 
measure  and  expected  accuracy.  Optimally,  perceptual  tests  of  cues  would  use 
stimuli  set  to  a specific  level  of  expectation.  Using  the  standard  normal  curve 
as  a model,  this  would  be  implemented  as  setting  different  cues  to  the  same  z- 
scores  in  their  respective  distributions.  Results  would  then  consist  simply  of 
the  identification  curves  obtained  by  joining  the  proportions  obtained  at  the 
various  settings. 

This  optimal  situation  is  unattainable  at  present.  For  one  thing  it 
would  require  a level  of  understanding  of  the  form  and  content  of  the 
memory  representation  of  cues  which  is  not  currently  available.  For  another, 
it  would  require,  for  many  cues,  extraordinarily  sophisticated  signal 
processing  capabilities  in  order  to  produce  the  requisite  modifications  on  the 
large  corpus  of  natural  stimuli  necessary  without  causing  a degradation  in 
their  perceived  naturalness.  There  is,  however,  a reasonable  alternative 
approach:  to  select  the  best-defined  measurement  and  modification 
techniques,  then  to  test  the  perception  according  not  to  the  resulting  cue 
setting,  but  rather  to  the  level  of  modification.  Since  d'  is  independent  of 
location  on  the  internal  perceptual  scale,  it  is  well  suited  to  testing  effects  of 
level  of  modification.  If,  at  a later  date,  the  measurement  or  modification 
techniques  are  shown  inappropriate,  the  stimuli  can  be  reassessed  and  the 
resulting  effects  adjusted  accordingly.  Similarly,  should  the  memory 
representation  of  cue  distribution  expectation  become  determinable. 


46 


3.7  Recapitulation 

SDT  springs  from  the  understanding  that  repeated  presentation  of  the 
same  stimulus  results  not  in  identical  sensations,  but  in  a distribution  of 
sensations.  Discriminability  of  two  signals  is  determined  by  the  separation  of 
the  sensory  responses  they  provoke.  Capitalizing  on  the  structure  of  these 
distributions,  SDT  is  able  to  quantify  sensitivity  (as  d')  and  response  bias  (as  p) 
separately. 

A fundamental  problem  in  applying  SDT  to  cross-language  cue 
research  is  determination  of  what  can  count  as  identical  stimuli  across 
languages.  This  is  a problem  because  research  has  shown  that  the  same  cue 
may  occupy  completely  different  ranges  of  measurement  in  different 
languages.  The  solution  proposed  here  is  to  recognize  that  all  cues,  whatever 
their  nature,  have  an  expected  distribution,  with  an  expected  value  and  an 
expected  precision.  Across  languages  (or  indeed  across  cues),  tokens  with 
equivalent  expectation  (i.e.,  tokens  from  analogous  locations  in  their 
respective  distributions)  can  be  taken  as  having  equal  signal  strength  in  their 
respective  languages. 

Using  this  cross-language  mastery  of  signal  strength,  it  is  possible  to 
judge,  and  ultimately  quantify,  the  overall  importance  of  a given  cue,  and 
then  to  compare  its  importance  across  languages.  Basically,  given  a series  of 
stimuli  of  equal  signal  strengths  in  two  languages,  the  language  where  the 
identification  rates  change  less  uses  the  information  in  that  cue  less,  and  the 
other  language  correspondingly  more.  However,  response  bias,  due  at  least  to 
phonological  structure  and  the  difficulties  of  experiment  design  in  multiply 
cued  contrasts,  make  it  impossible  to  use  simple  identification  rates,  and 
make  the  sensitivity  measure  d'  a better  choice  for  judging  perceptual  effect. 


CHAPTER  4 

PROCEDURAL  OVERVIEW 


In  order  to  operationalize  and  demonstrate  the  use  of  the  measure 
described  in  the  preceding  chapter,  a two-part  experiment  was  performed. 

The  first  part  consisted  of  phonetic  measurement  of  a set  of  cues  in  a corpus, 
and  the  second  was  a perceptual  test  of  the  corpus,  appropriately  edited,  and 
application  of  the  measurement  scheme.  For  descriptive  convenience,  the 
two  parts  will  be  considered  as  separate  experiments,  but  this  chapter 
considers  the  aspects  common  to  both:  the  overall  experiment  design,  and  the 
corpus. 


4.1  Experiment  Design 

It  has  been  established  (see  Chap.  2)  that  speech  perception  depends  on 
multiple  cues,  and  that  cues  may  have  different  expected  distributions  in 
different  languages.  This  study  proposes  that  the  importance  of  a given  cue 
in  a given  language  (dialect,  phonemic  environment,  etc.)  can  be  measured  by 
signal  detection's  d',  with  signal  strength  normalized  on  the  cue's 
distribution.  The  normalization  process  requires  the  recording  and 
measurement  of  a corpus  instantiating  the  cues  of  interest,  in  order  to  obtain 
the  descriptive  statistics  on  the  cues'  distributions.  This  fairly  routine 
phonetic  survey  is  experiment  one. 

Experiment  two,  the  perceptual  test,  is  much  more  complex.  The 
calculation  of  d'  requires  the  determination  of  two  error  rates:  misses  and 


47 


48 


false  alarms.1  This  in  turn  requires  the  precise  definition  and  design  of  signal 
and  noise  stimuli,  the  design  of  their  presentation  to  subjects,  and  the  design 
of  the  subjects  responses.  Finally,  once  the  test  is  given,  d'  can  be  calculated 
and  any  experimental  hypotheses  examined. 

In  a typical  phonetic  perception  experiment,  with  a stimulus  series 
based  on  a synthetic  or  natural  prototype,  it  is  hard  to  imagine  a justification 
for  splitting  the  series  into  signal  and  noise  categories.2  However,  it  has  been 
noted  above  (section  2.6)  that  such  experiment  designs  are  likely  to  render 
skewed  results,  and  that  single-use  natural  stimuli  may  be  preferable.  So  the 
question  becomes:  How  can  natural  stimuli  be  designated  as  signal  and  noise, 
so  that  their  (mis-)  perception  delivers  the  appropriate  error  rates? 

A naturally  produced  utterance  must  be  presumed  to  have  exactly  the 
phonemic  structure  intended  by  the  talker.  A (cooperative,  normal)  talker 
will,  with  appropriate  instruction,  correctly  produce  intervocalic  /t/,  and  not 
/d,  p/  etc.  The  identity  of  the  / 1/  then  depends  on  the  talker's  judgment  that 
she  has  correctly  produced  the  target,3  and  not  on  any  particulars  of  the 
acoustic-phonetic  result.  Since  the  various  distinctive  features  are 
redundantly  encoded  by  multiple  cues,  as  voicing  is  by  those  of  the  Lisker  List, 
listener  misperception  of  / 1/  as  another  phoneme  is  fairly  unlikely,  though  it 
does  occur. 

However,  if  a researcher  artificially  shortens  the  naturally  produced  /t/ 
it  is  rendered  more  / d/-like,  and  listeners  will  be  more  likely  to  perceive  it  as 

1Recall  that  the  four  basic  response  categories  are  hits  and  misses  for  signal 
trials,  and  false  alarms  and  correct  rejections  for  noise  trials. 

2In  fact,  the  calculation  of  cumulative  d'  (Macmillan  & Creelman,  1991)  is 
particularly  suited  to  such  situations,  since  it  does  not  presume  the 
qualitative  differences  between  stimuli  that  justify  separate  designations  of 
signal  and  noise.  This  is  discussed  further  in  section  6.3. 

3The  talker  presumably  recognizes  production  errors  as  mispronunciations. 


49 


a /d/.  We  can  declare  that  the  shortening  of  the  / 1/  is  a signal,  and  "mis-" 
perception  of  the  shortened  token  as  /d/  constitutes  a hit,  since  it  implies 
detection  of  the  signal.  Continued  (correct!)  perception  of  the  shortened 
token  as  /t/  is  a miss.  The  unmodified  /t/s  are  thus  termed  noise  trials  by 
default.  If  misperceived  as  /d/  it  is  a false  alarm,  and  if  rightly  perceived  as  a 
/ 1/  it  is  a correct  rejection.4 

Since  only  one  of  the  multiple  cues  is  modified  per  token,  the  great 
majority  of  listener  percepts  is  likely  to  remain  /t/,  so  any  manipulation  of 
response  proportions5  must  be  accomplished  by  including  naturally  produced 
/d/s  in  the  stimulus  set.  Efficient  research  would  suggest  combining  the 
short  / 1/'  experiment  with  a similarly  designed  experiment  on  a cue  in  the 
/d/  corpus,  for  instance  a "damped  /d/  closure  voicing"  experiment.  Since 
the  editings  produce  only  a small  proportion  of  misperceptions  in  either 
direction,  the  stimuli  from  the  two  experiments  can  be  randomized  together 
to  balance  the  response  proportions,  and  the  subjects  can  use  the  same  "T/D" 
response  set. 

The  pattern  for  each  cue  is  therefore  the  following:  A single  cue  is 
isolated  and  modified  in  such  a fashion  as  to  reduce  the  token's  contrast  with 
an  opposing  phoneme.  This  modification  defines  the  signal  tokens,  which 
listener  perception  divides  into  hits  and  misses.  Unmodified  tokens  are 
noise,  and  their  perception  gives  false  alarms  and  correct  rejections.  The 
amount  of  modification  is  the  signal  strength  and  is  set  by  the  researcher  with 
reference  to  prior  phonetic  surveys,  such  as  experiment  one. 

4Note  that  "errors"  as  defined  by  signal  detection  theory  (i.e.,  misses  and  false 
alarms)  are  not  equivalent  to  what  the  listener  would  functionally  define  as 
errors:  any  token,  modified  or  not,  that  is  perceived  other  than  as  the  speaker 
intended.  Again,  the  point  is  moot  if  cumulative  d'  is  used. 

5Stimulus  proportions  are  known  to  affect  response  bias. 


50 


Stimulus  presentation  can  include  sets  of  stimuli  manipulating 
different  cues  on  both  sides  of  a phonemic  contrast.  These  sets  can  be 
randomized  together  in  order  to  control  response  proportion.  The  subject  can 
then  be  given  a 2-alternative  forced  choice  identification  task,  with  the  2 
contrasting  phonemes  as  the  response  alternatives.6  Once  the  data  are 
collected,  the  researcher  sorts  the  responses  appropriately  and  calculates  any 
d’  required. 


4.2  Corpus  Design 

In  the  long  term,  investigation  of  the  role  of  cues  will  require 
comparison  of  cue  behavior  across  a variety  of  cues,  environments,  and 
languages.  To  that  end,  it  is  naturally  preferable  to  investigate  speech 
materials  which  ease  that  comparison. 

Opposition 

The  selection  of  speech  materials  upon  which  to  focus  seemed  best 
made  to  relate  to  the  phonetic  literature,  to  generalizability  across  languages, 
and  to  a broad  variety  of  technical  approaches.  To  integrate  well  in  the 
phonetic  literature,  the  speech  materials  should  be  well  and  often  described, 
and  generally  uncontroversial.  To  generalize  well,  the  sounds  should  be 
common  in  the  world's  phonemic  inventories.  To  allow  investigation  with 
varying  technical  resources,  the  speech  sounds  should  be  rich  in  cues  of 
varying  kinds,  so  that  cues  amenable  to  the  available  technology  can  be 
selected. 


6Obviously,  other  presentation  and  response  formats  are  possible,  but  this 
fairly  straightforward  and  convenient  format  is  used  for  this  study. 


51 


The  best  opposition  to  study  was  judged  to  be  that  of  stop  consonant 
voicing.  Stop  consonants  appear  to  be  universal  in  human  languages,  and 
voicing  is  involved  in  the  phonetic  implementation  of  the  vast  majority  of 
languages  that  oppose  2 or  more  series  of  stops.*7  Stop  consonants  are  well 
understood  through  a vast  number  of  studies,  and  these  studies  are  not 
particularly  controversial  as  regarding  the  facts  of  phonetic  structure.  Finally, 
as  noted  by  Lisker  (1986)  among  others,  there  is  a rich  set  of  possible  cues  that 
may  be  investigated,  they  may  be  investigated  by  various  different 
measurement  and  editing  techniques,  and  these  cues  involve  all  the  basic 
perceptual  concerns  of  psychoacoustics:  frequency,  intensity,  duration,  and 
spectrum. 

What  position  in  the  word  is  the  most  appropriate?  Word  medial 
intervocalic  position  seemed  most  advantageous,  since  its  characteristics  are 
open  to  comparison  with  all  other  positions:  cluster-,  syllable-,  word-,  and 
utterance-  initial  and  final.  Recall  that  the  Lisker  List  cues  can  be  classified  as 
occurring  before,  during,  or  after  stop  closure.  In  general,  the  postclosure  cues 
occur  in  initial  stops  and  the  preclosure  cues  in  final  stops.  Only  the 
intervocalic  position  uses  all  3 cue  sets,  from  before,  during,  and  after  closure, 
and  thus  allows  comparison  with  initial  and  final  stops.  The  word  medial 
(more  precisely,  "pseudoword"  medial)  position  allows  for  comparison  with 
more  richly  specified  morphological  and  syntactic  environments.  Only  the 
word  medial  intervocalic  position  allows  for  all  these  possibilities. 

What  place  of  articulation  is  most  advantageous?  Velar  stops  tend  to 
have  double  release  bursts.  This  may  be  a cue  in  its  own  right,  but  in  any  case 
it  raises  problems  in  timing  measurements.  Both  bilabial  and  velar  places 

^Maddieson  (1984)  reports  that  94.2%  of  the  UPSID  languages  contrast  2 or 
more  stops  series,  and  of  those  languages  over  90%  oppose  2 or  more  series  by 
voicing  or  VOT  contrasts. 


52 


tend  to  have  gaps  in  the  stop  series  in  phonologies  of  the  world's  languages, 

with  either  or  both  of  /p,  g/  missing.  For  both  these  reasons,  the  apical  place 
was  chosen. 

Thus,  the  study  targeted  word  medial  intervocalic  /t,  d/. 

Cues 


The  selection  of  cues  upon  which  to  focus  seemed  best  made  in 
relation  to  cue  oppositional  structure,  basic  perceptual  processes,  technical 
ease,  and  response  balance.  Both  bimodal  and  monomodal  cues  (recalling 
section  3.3)  should  be  tested  in  order  to  allow  comparison  of  the  two 
structures.  Perception  of  frequency,  amplitude,  duration,  and  spectrum 
involve  fundamentally  different  processes,  so  cues  involving  at  least  two 
processes  should  be  tested.  The  large  number  of  measurements  and 
modifications  foreseen  dictated  the  use  of  cues  which  were  fairly  easy  to 
measure  and  modify.  And  balance  of  responses  dictated  that  cues  on  both 
sides  of  the  / t~d/  contrast  be  used. 

The  cues  chosen  for  study  were  consonant  duration,  burst  amplitude,* * * 8 
and  (closure)  voicing  amplitude.  Consonant  duration  is  a bimodal  cue,  since 
voiced  and  voiceless  both  have  durations,  and  they  differ,  voiceless  being 
longer  than  voiced.  Burst  amplitude  is  monomodal,  since  voiceless  stops 
have  a clear  burst  and  voiced  stops  little  to  none.  Voicing  amplitude  is 
monomodal,  since  voiced  stops  have  distinct  periodicity  throughout,9  while 

technically,  this  cue  concerns  the  entire  "burst-plus-aspiration"  aperiodic 

segment.  (See  Repp,  1979,  for  discussion.)  This  will  be  referred  to  simply  as 

the  burst,  partly  for  convenience,  and  partly  because  French  voiceless  stops 

are  not  typically  analyzed  as  aspirated. 

9In  the  occasional  exception  to  this  rule,  there  will  be  a brief  dip  in  voicing 
amplitude,  in  the  order  of  tens  of  milliseconds,  immediately  before 
consonant  release. 


53 


in  voiceless  stops,  periodicity  quickly  rolls  off  to  a negligible  level.  The  two 
cue  opposition  patterns  are  thus  tested. 

The  consonant  duration  cue  clearly  involves  the  perception  of 
duration.  The  burst  amplitude  and  voicing  amplitude  cues  both  involve 
perception  of  amplitude.  Two  different  basic  perceptual  processes  are  thus 
invoked. 

With  natural  stimulus  tokens,  the  measurement  of  spectral  and 
frequency  cues  is  not  especially  difficult,  and  their  modification  is  simple 
when  creating  stimuli  from  scratch  using  appropriate  synthesizers.  However, 
the  modification  of  such  cues  in  natural  tokens  while  maintaining 
naturalness  would  require  computational  materials  and  training  not 
commonly  available  in  the  phonetics  community.  On  the  other  hand,  both 
measurement  and  modification  of  duration  and  amplitude  cues  is  now  fairly 
easy  with  software  on  the  market  for  personal  computers,  and  is  thus 
commonly  available  to  phonetics  and  speech  researchers.  The  cues  selected 
are  thus  sufficiently  easy  to  measure  and  modify. 

Longer  duration  and  burst  amplitude  are  cues  to  the  voicelessness  of 
/t/,  while  shorter  duration  and  voicing  amplitude  are  cues  to  the  voiced-ness 
of  /d/.  Presuming  analogous  stimulus  sets  for  all  four  cues  and  roughly 
similar  perceptual  effects  for  analogous  stimuli  across  cues,  the  proportions  of 
"D"  and  "T"  responses  should  be  approximately  equal,  and  no  problem  for 
response  bias. 

Thus,  the  study  targeted  duration  and  burst  amplitude  of  /t/,  and 
duration  and  voicing  amplitude  of  /d/. 


54 


Language 

The  language  used  was  French,  for  several  reasons.  First,  the  French 
phonological  voicing  distinction  is  typically  described  as  a fairly 
straightforward  phonetic  voicing  distinction  (Casagrande,  1984;  Tranel,  1987). 
In  contrast,  the  English  phonological  distinction  is  sometimes  described  as 
voicing,  sometimes  as  tense  versus  lax.  The  implementation  of  this 
distinction  usually  involves  phonetic  voicing,  but  for  initial  stops  in  stressed 
syllables  aspiration  is  also  crucial,  as  is  vowel  length  in  the  distinction 
between  obstruents  in  the  coda  position.  Second,  French  does  not  have  the 
complications  of  neutralization  following  /s/,  nor  of  intervocalic  flapping  of 
apical  stops  in  post-stress  position.  Last,  in  the  crucial  case  of  pre-stress 
intervocalic  stops,  which  is  the  focus  of  this  study,  the  phonetic  structure  of 
the  phonological  opposition  is  quite  comparable  in  French  and  English. 
Specifically,  there  is  nothing  phonologically  exceptional  about  the 
circumstances  in  either  language,  and  both  languages  can  be  expected  to  use 
most  of  the  Lisker  List  of  voicing  cues. 

The  phonological  and  phonetic  simplicity  of  French  makes  it  a good 
candidate  for  an  early  study  such  as  this,  since  the  findings  should  be 
illustrative  of  patterns  in  many  other  similarly  structured  languages.  The 
crucial  point  of  similarity  with  English,  which  is  encoded  in  the  corpus 
studied  here,  provides  a point  of  contact  for  interesting  future  contrastive 
studies,  since  French  and  English  voicing  are  phonologically  similar,  but 
phonetically  different. 


55 


4.3  Corpus 

The  preceding  sections  dealt  with  the  design  of  the  study.  This  section 
documents  the  details  of  the  corpus  recorded  to  instantiate  that  design. 

Tokens 

The  study  focused  on  intervocalic  /t~d/,  so  the  tokens  needed  the 
structure  / . . .VCV.  . ./.  In  order  to  facilitate  extraction  of  the  token  from  the 
carrier  sentence,  and  to  allow  easier  vowel  duration  measurement  if  needed, 
this  structure  was  capped  by  consonants,  resulting  in  the  structure:  /CVCVC/. 

To  be  representative  of  the  entire  vowel  space,  but  without  excessive 
multiplication  of  stimuli,  the  vowels  /i,  a,  u/  were  used,  and  the  two  vowels 
in  each  /CVCVC/  token  were  identical. 

The  initial  consonant  was  always  /d/,  and  the  final  consonant  was 
always  /t/,  for  several  reasons:  The  listeners  did  not  have  the  added 
distraction  of  changes  in  place  of  articulation.  The  listeners  had  a stable 
consonantal  frame  within  which  to  focus  on  the  target  consonant.  The 
listeners  were  not  cued  to  the  target  consonant  identity  by  the  frame.  For  the 
talker,  the  initial  /d/  reinforced  voicing  of  the  first  vowel,  which  might 
otherwise  have  suffered  devoicing  from  the  French  accent-final  pattern. 

With  the  / d/  first  and  /t/  last  (and  either  one  medially),  all  tokens  switch 
from  voiced  to  voiceless  consonants,  differing  only  in  the  location  of  the 
shift.10  Lastly,  all  of  the  resulting  tokens  were  non  words  in  French,  and  thus 
would  not  be  expected  to  invoke  further,  possibly  complicating  levels  of 
linguistic  processing. 

10This  might  make  a difference  for  the  control  processes  implied  by  certain 
phonologies,  e.g.,  Browman  and  Goldstein  (1986)  and  McCarthy  (1981). 


56 


Thus  the  stimulus  tokens  were 


/didit/  /dadat/  /dudut/  /ditit/ /datat/  /dutut/ 


Talker 


The  researcher  sought  a talker  whose  French  seemed  as  average  and 
unaffected  by  foreign  language  influence  as  possible.  The  talker  chosen  was  a 
female  in  her  mid-twenties,  a native  speaker  of  French,  with  no  noticeable 
regional  or  social  accent,  nor  any  other  speech  idiosyncrasies  apparent  to  the 
researcher.  Her  parents  are  native  French  citizens,  and  spoke  neither  foreign 
languages  nor  local  dialect.  She  had  never  left  France  for  periods  of  more 
than  a few  weeks  for  tourism.  Though  a graduate  student  in  phonetics,  she 
was  not  a willing  student  of  languages,  speaking  none  fluently  but  French, 
and  using  only  such  English  as  was  either  current  among  the  French  speaking 
public  or  was  unavoidable  in  her  field,  where  much  of  the  standard  jargon  is 
English-based  and  articles  are  read  in  their  original  English.  In  short,  the 
researcher  found  nothing  in  her  speech  or  her  language  background  that 
could  prevent  her  being  characterized  as  a typical  native  speaker  of  French. 

Recording 

The  full  corpus  consisted  of  25  repetitions  each  of  these  18  nonsense 
words:11 


/ditit/ 

/ datat/ 

/ dutut/ 

/ didit/ 

/dadat/ 

/ dudut/ 

/bipip/ 

/bapap/ 

/bupup/ 

/bibip/ 

/babap/ 

/bubup/ 

/ gikik/ 

/gakak/ 

/ gukuk/ 

/gigik/ 

/gagak/ 

/guguk/ 

11The  bilabial  and  velar  series  were  recorded  for  possible  later  use. 


57 


as  well  as  2 repetitions  each  of  these  18  nonsense  words:12 


/ vifif/ 

/vafaf/ 

/ vufuf  / 

/vivif/ 

/ vavaf  / 

/ vuvuf  / 

/zisis/ 

/zasas/ 

/zusus/ 

/zizis/ 

/zazas/ 

/ zuzus/ 

/ 3aJa// 

/ 3U/U// 

/^i// 

/ 3a3a// 

/ 3U3UJ/ 

These  last  18  words  use  fricatives  instead  of  stops,  but  otherwise  follow  the 
same  pattern  as  the  stop  corpus. 

The  entire  corpus  was  randomized  together,  with  the  exception  of  1 
repetition  each  of  the  fricative  tokens.  These  were  randomized  separately  and 
incorporated  at  the  beginning  of  the  corpus,  to  be  read  first  as  part  of  the 
talker’s  accommodation  to  the  task  and  then  discarded. 

The  materials  recorded  by  the  talker  were:  "La  Bise  et  le  soldi",  a short 
fairy  tale  from  the  IPA  (1949,  see  Appendix  A);  a set  of  numbers  intended  for 
possible  use  in  informing  the  listeners  of  the  stimulus  number,  but 
ultimately  not  used;  a short  text  of  general  instructions  to  listeners;  and  the 
full  corpus. 

The  materials  other  than  the  corpus  tokens  were  read  from  typed  texts 
by  the  talker.  The  corpus  tokens  were  read  not  by  the  talker,  but  by  the 
researcher,  from  a typed  list,  but  they  were  read  softly  to  the  talker  rather  than 
into  the  microphone.  The  talker's  response  upon  hearing  a token,  was  to 

repeat  it  into  the  microphone,  in  the  sentence  frame,  "C'est , encore?", 

with  question  intonation  and  with  (ideally)  just  enough  pause  before  and 
after  the  token  itself  to  break  vocalization. 

This  cueing  of  the  talker  served  two  purposes.  It  relieved  her  of 
having  to  follow  her  progress  in  the  list  of  stimuli,  and  left  it  to  someone 
who  could  attend  to  nothing  else  but  that.  It  also  helped  her  maintain  a 


12The  fricative  series  were  recorded  for  later  use  as  a training  set  for  listeners, 
but  ultimately  were  not  used. 


58 


relatively  natural  utterance  style,  by  limiting  her  ability  to  anticipate  and 
preplan  utterances,  by  requiring  her  to  pay  attention  in  order  to  repeat  the 
stimulus  correctly,  and  by  giving  the  researcher  opportunities  to  break  the 
hypnotic  "list"  monotony  when  needed  by,  for  instance,  making  faces.  While 
the  cueing  process  adds  the  risk  that  the  talker  could  misunderstand  a cue, 
that  eventuality  would  be  eliminated  in  the  token  verification  process 
discussed  below. 

The  insertion  of  the  nonsense  word  in  this  sentence  frame  allowed  use 
of  sentence  prosody  to  control  loudness  and  intonation,  thus  avoiding  the 
characteristic  drone  of  a list  intonation.  The  question  intonation  was  chosen 
to  impose  a single  intonation  pattern  on  the  experimental  token,  and  to 
avoid  the  several  possible  intonational  variants  a declarative  sentence  would 
allow.  Specifically,  question  intonation  imposes  a slight  rise  in  pitch  and  a 
slight  increase  in  loudness,  and  this  intonation  is  only  one  of  the  possibilities 
in  a declarative  intonation.  The  tag  word  "encore"  displaces  the  more 
variable  final  intonation  away  from  the  experimental  word.  It  may  also  bear 
whatever  effect  list  intonation  may  have  across  sentences. 

The  pause  at  the  beginning  and  end  of  the  test  token,  while  not  crucial, 
was  intended  to  aid  in  segmentation  and  measurement  of  the  initial  and  final 
consonants,  neither  of  which  was  crucial  to  this  experiment.  More 
importantly,  it  kept  the  listeners  in  the  perceptual  tests  from  any  impression 
that  the  tokens  were  "cut  out"  of  running  speech,  which  might  have  been 
distracting  to  them. 

Token  Verification 

Within  the  week  after  the  recording  session,  the  recording  was 
submitted  to  the  talker,  for  her  to  make  her  own  judgment  as  to  whether  the 


59 


word  she  spoke  was  indeed  that  which  was  intended.  She  eliminated  several 
tokens  as  mistakes.  Soon  thereafter,  the  researcher  did  the  same  critical 
review  and  submitted  several  tokens  to  her  as  questionable;  she  agreed  and 
the  tokens  were  eliminated.  A few  more  tokens  were  eliminated  during  the 
measurement  stage.  The  result  was  that  certain  word  classes  lost  a few  tokens 
each  of  the  25  repetitions  planned.  On  the  plus  side,  there  were  a few 
supplementary  repetitions  occasioned  by  backtracking  in  the  corpus  list  at  a 
tape  change  during  recording.  In  the  end,  the  number  of  usable  tokens 
varying  from  23  to  25  in  the  apical  word  classes  which  were  ultimately  used 
in  the  perception  experiments. 

Digitization 

Analog  to  digital  conversion  was  done  using  an  Audiomedia  card  and 
its  accompanying  software,  installed  on  an  Apple  Macintosh  Ilfx  computer. 
The  full  carrier  sentences  were  digitized  at  a sampling  rate  of  16000  Hz,  and 
the  individual  speech  tokens  were  extracted  therefrom  by  the  digital  editor  in 


Audiomedia. 


CHAPTER  5 

EXPERIMENT  1:  PHONETIC  SURVEY 


This  chapter  details  the  methods  and  results  of  Experiment  1,  the 
measuring  of  the  four  cues  used  in  this  study.  The  following  chapter,  on  the 
methods  of  Experiment  2,  explains  how  these  results  are  used  to  pilot  the 
creation  of  perceptual  stimuli  by  editing  of  the  tokens  measured  here. 

5J_  Measurement  Philosophy  and  its  Implementation 

Cues  have  historically  been  measured  in  a variety  of  ways,  according  to 
the  researchers'  needs  and  available  technology.  For  this  study,  they  were 
measured  through  techniques  which  were  an  adaptation  of  the  methods 
developed  at  the  Institut  de  la  Communication  Parlee  (ICP),  by  a team  under 
the  direction  of  C.  Abry  (Abry,  Benoit,  & Sock,  1985;  Abry,  Benoit,  Boe,  & Sock, 
1985).  The  basic  philosophy  is  that  certain  instantaneous  (i.e.,  duration-less) 
acoustic  events  which  are  known  to  have  articulatory  causes  are  labeled  in 
the  acoustic  signal.  These  events  serve  as  landmarks,  typically  as  beginning 
and  end  points  of  (relatively)  homogeneous  segments  to  be  studied. 

For  this  study,  a subset  of  the  ICP  events  was  hand  labeled  in  each 
token  of  the  corpus.  Table  5.1  gives  some  details  about  these  events,  and 
Figure  5.1  shows  their  location  in  example  tokens  of  [dadat]  and  [datat].  The 
events  in  parentheses  in  the  table  not  used  in  this  experiment. 


60 


61 


Table  5.1. 

Phonetic  Events  and  their  Labels 


Position 

Phenomenon  Characterized 

Phenomenon 

Onset 

Termination 

Voice 

VO 

VT 

Glottal  periodicity 

Frication 

(FO) 

(FT) 

Glottal  or  supraglottal  aperiodicity 

Vocalic  Voice 

WO 

V VT 

Formant  structure 

Consonantal 

(CVO) 

(CVT) 

Shifts  within  voiced 

Voice 

consonants  and  clusters 

Consonantal 

CFO 

(CFT) 

Shifts  within  voiceless 

Frication 

consonants  and  clusters 

100.0 200,0 300,0 400.0 500.0  ms 

Figure  5.1.  Location  of  events  in  sample  tokens  of  /dadat/  and  /datat/. 

Hand-labeling  of  the  events  was  unavoidable  in  the  absence  of 
available  signal  processing  routines  capable  of  detecting  these  events.* 


lA  useful  automatic  segmentizer  remains  an  unfulfilled  dream  of  phonetics 
researchers.  See  Andre-Obrecht,  1988,  and  Feng,  Achab,  and  Combescure,  1991 
for  ongoing  work  in  the  area. 


62 


Naturally,  such  human  intervention  is  subject  to  variability  due  to  several 
factors.  Some  events  were  less  susceptible  to  this  variation  than  others.  For 
instance,  the  CFO  (Consonantal  Frication  Onset)  marking  stop  consonant 
release  is  practically  unmistakable.  On  the  other  hand,  the  VVT  (Vocalic 
Voice  Termination)  marking  the  end  of  formant-rich  vowel  voicing  before  a 
stop  consonant  is  almost  always  a judgment  call.  Three  techniques  were  used 
to  attempt  to  limit  such  variation. 

First,  though  the  labeling  was  tied  directly  to  the  raw  speech  signal, 
multiple  representations  derived  therefrom  were  used  as  needed  to  help 
reveal  the  more  significant  aspects  of  the  signal.  This  was  done  using 
Signalyze,  version  2.0,  an  acoustic  signal  processing  software  package  for  the 
Apple  Macintosh,  designed  by  Eric  Keller  primarily  for  speech  research 
(Keller,  1991  & 1994).  (Signalyze  was  particularly  well  suited  to  this  task,  since 
it  allows  simultaneous  viewing  of  multiple  time-aligned  signal 
representations.)  For  instance,  a spectrographic  representation  was  often 
useful  in  helping  determine  WO  (Vocalic  Voice  Onset),  and  high-pass  filter 
output  was  sometimes  helpful  in  addition  to  the  spectrogram  in  determining 
VVT  (Vocalic  Voice  Termination). 

Second,  before  labeling  the  research  corpus,  the  researcher  practiced  the 
same  labeling  techniques  to  a level  of  perceived  proficiency  using  a different 
corpus  with  the  same  phonetic  patterns.  This  was  done  so  that  any  variation 
associated  with  adaptation  to  or  learning  of  the  labeling  task  would  be  over 
before  it  could  impact  the  real  corpus.  Furthermore,  the  practice  was 
supervised  by  Rudolph  Sock,  the  most  experienced  labeler  at  the  ICP. 

Third,  in  order  to  minimize  criterion  fluctuation  due  to  forgetfulness 
or  confusion,  like  events  were  labeled  all  at  once  across  the  corpus.  For 
instance,  all  the  VVTs  (Vocalic  Voice  Terminations)  were  measured  in  all  the 


63 


tokens,  then  all  the  VTs  (Voice  Terminations),  rather  than  measuring  the 
various  events  in  the  / ditit/ s,  then  in  the  / datat/s,  etc.  Since  this  second 
pattern  would  require  longer  maintenance  of  more  subjective  criteria  in 
memory,  it  would  presumably  result  in  a more  variable  labeling 
performance,  and  the  first  technique  was  therefore  preferred. 

These  three  techniques  were  used  in  order  to  get  the  labeling  right  the 
first  time.  Section  5.5  details  a follow-up  process  to  identify,  and  either  verify 
or  correct,  any  tokens  that  might  have  been  mislabeled.  Note  that  it  is 
ultimately  not  the  means,  but  rather  the  standard  deviations  of  the 
measurements  which  are  important  here  (cf.  section  5.7).  They  will  be  used  to 
pilot  creation  of  the  perceptual  stimuli  crucial  to  Experiment  2,  while  the 
means  play  no  role  at  all  in  the  process. 

5.2  Labeling  Events 

This  section  details  the  protocols  used  to  mark  the  5 event  types  used 
in  this  study.  Close  approximations  to  these  protocols  were  arrived  at  during 
the  practice  sessions  discussed  above,  and  these  final  versions  resulted  from 
the  minimal  modifications  which  were  found  necessary  to  accommodate  the 
data  during  the  labeling  process.  They  are  presented  below  in  the  order 
appropriate  to  our  consonantal  focus. 

VVT  (Vocalic  Voice  Termination) 

This  event  is  intended  to  divide  presence  and  absence  of  distinct, 
relatively  stable  formant  structure  in  F2  and  up.  It  was  marked  at  a zero 
crossing.  During  the  relatively  long,  slow  roll-off  of  the  preceding  vowel's 
amplitude,  effective  loss  of  formant  structure  was  sometimes  quite  difficult  to 
judge.  In  such  situations,  a transition  area  was  determined,  bounded  by  the 


64 


earliest  and  latest  candidates  for  the  label,  and  then  the  label  was  set  so  as  to 
split  this  transition  area  as  evenly  as  possible.  Beyond  the  signal,  the 
representations  most  often  used  were  the  spectrogram,  high-pass  filter  output 
with  the  cutoff  set  at  2 kHz,  and  the  RMS  amplitude  envelope  of  the  high- 
pass  filter  output. 

VT  (Voice  Termination! 

This  event  is  intended  to  identify  the  instant  of  loss  of  voicing,  which 
occurs  in  this  study  during  the  closure  of  a voiceless  stop  consonant.  It  was 
marked  at  a rising  zero-crossing.  During  the  final  rolloff  of  voicing,  there  was 
sometimes  a cycle  or  two  of  voicing  with  the  appropriate  period,  but  with  an 
amplitude  no  greater  than  the  background  noise  during  the  consonantal 
closure.  Such  cycles  were  considered  to  be  inaudible,  so  that  VT  was  labeled  at 
the  end  of  the  last  period  with  a peak-to-peak  excursion  greater  than  the 
subsequent  background  noise.  Beyond  the  signal,  the  representations  most 
often  used  were  low  pass  filter  output  with  the  cutoff  set  at  350  Hz,  and 
occasionally  the  RMS  amplitude  envelope  of  the  low-pass  filter  output. 

CFO  (Consonant  Frication  Onset! 

This  event  is  intended  to  mark  the  beginning  of  the  noise  resulting 
from  the  release  of  stop  consonant  closure.  The  exact  configuration  of  the 
signal  at  release  is  highly  variable,  consisting  sometimes  of  one  major  burst, 
sometimes  of  several  "sub-bursts"  of  which  the  first  may  not  be  the  loudest. 
The  label  was  set  at  the  first  major  peak  in  first  or  second  excursion  direction 
of  first  burst;  this  was  always  within  the  first  half-millisecond  of  the  first 
increase  of  burst  amplitude  beyond  background  noise.  On  rare  occasions, 
there  were  episodes  of  noise  apparently  caused  by  air  leakage  during  the 


65 


buildup  of  oral  pressure  behind  the  closure.  Since  these  episodes  were  always 
less  than  a millisecond  in  duration,  and  were  always  followed  by  re- 
establishment of  stop  closure  and  silence,  they  were  considered  immaterial 
and  not  part  of  the  release  burst.2  The  raw  signal  was  the  only  representation 
used  in  labeling  this  event. 

VO  (Voice  Onset) 

This  event  is  intended  to  identify  the  instant  of  establishment  of 
voicing.  In  this  study,  this  event  occurs  after  the  release  of  a voiceless  stop 
consonant.  It  was  marked  at  a zero-crossing  (or  rarely,  a non-zero  departure, 
when  obvious).  Since  voicing  began  during  the  decreasing  frication 
following  a release  burst,  it  was  sometimes  difficult  to  tell  whether  the 
excursion  of  a particular  segment  of  signal  represented  the  first  half-cycle  of 
voicing,  or  merely  a low  frequency  noise  component.  After  considering  both 
alternatives,  the  label  was  set  so  as  to  include  the  ambiguous  portion  within 
the  alternative  wherein  it  was  less  conspicuously  different.  Beyond  the 
signal,  the  representations  most  often  used  were  low  pass  filter  output  with 
the  cutoff  set  at  350  Hz,  and  occasionally  the  RMS  amplitude  envelope  of  the 
low-pass  filter  output. 

WO  (Vocalic  Voice  Onset! 

This  event  is  intended  to  divide  presence  and  absence  of  distinct 
relatively  stable  formant  structure  in  F2  and  up.  It  was  marked  at  a zero 
crossing.  Amplitude  buildup  at  vowel  onset  is  quite  rapid,  so  the  full,  strong 


2At  the  velar  place  of  articulation  ([k-g]),  leakage  was  much  more  common 
and  more  likely  to  bleed  into  the  true  burst.  This  is  one  of  the  reasons  apicals 
were  chosen  over  velars  for  this  study. 


66 


formant  structure  appeared  quite  early,  often  during  the  formant  transitions 
caused  by  the  consonant.  In  cases  where  there  were  nonetheless  two 
candidate  zero  crossings  during  the  transition,  the  label  was  set  at  the  latter,  in 
order  to  leave  more  of  the  transition  with  the  consonant  which  caused  it. 
Beyond  the  signal,  the  representations  most  often  used  were  the  spectrogram, 
high-pass  filter  output  with  the  cutoff  set  at  2 kHz,  and  the  RMS  amplitude 
envelope  of  the  high-pass  filter  output. 


5.3  Durations 

For  reasons  previously  explained  (section  4.2),  a consonant  duration 
measure  was  desired  which  would  be  analogous  across  voiced  and  voiceless 
consonants.  This  duration  was  to  be  designed  using  the  events  as  marked  in 
voiceless: 

VVT  VT  CFO  VO  WO, 
and  voiced  consonants: 

VVT  CFO  WO. 

Let  us  consider  how  this  should  be  done. 

Clearly  such  a measure  could  not  use  the  events  VT  and  VO,  since 
these  simply  do  not  occur  in  intervocalic  voiced  consonants.3  This  rules  out 
any  measure  directly  representing  consonant  closure,  since  VT,  the  only 
marked  event  linked  to  closure  initiation,  exists  only  in  the  voiceless 
consonants.4  The  segment  CFO-VVO  is  a poor  choice  to  represent  consonant 


3As  a result,  VT  was  not  used  in  the  definition  of  any  cue  in  this  study. 

4VT  would  in  fact  be  a poor  representative  of  closure  initiation,  since  voice 
termination  in  this  environment  is  typically  subsequent  to  closure.  Another 
event  (perhaps  CO:  closure  onset)  could  be  defined  to  label  the  amplitude 
drop  which  is  the  best  acoustic  indicator  of  consonantal  closure,  and  which, 
unlike  VT,  would  be  usable  for  both  voiced  and  voiceless  consonants.  Such  a 
re-working  of  the  event  inventory  was  deemed  tangential  to  the  immediate 


67 


duration  since  it  excludes  the  closure  duration,  which  has  long  been  known 
to  be  an  important  difference  between  voiced  and  voiceless  consonants.  The 
two  choices  remaining  are  the  segment  durations  VVT-CFO  and  VVT-VVO, 
both  of  which  include  the  closure,  and  both  of  which  are  labeled  in  both 
voicing  conditions.  However,  consider  the  difference  in  importance  of  CFO 
in  voiced  and  voiceless  consonants. 

VOT  (Voice  Onset  Time),  one  of  the  earliest  and  most  studied  cues,  is 
classically  measured  from  the  CFO  landmark.* 5  Recall  (section  2.4)  that  Lisker 
and  Abramson  (1964)  discovered  the  three  categories  in  initial  stops  (lead, 
short  lag,  and  long  lag)  by  measuring  the  instant  of  voice  onset,  here  termed 
VO,  relative  to  the  instant  of  articulator  release,  here  termed  CFO.  Initial 
stops  with  voicing  lead  thus  have  negative  VOT  values  since  voicing  began 
"in  the  past"  relative  to  articulator  release.  Initial  stops  with  voicing  lag  have 
zero  or  positive  VOT  since  voicing  begins  at  or  after  release. 

Klatt  (1975)  departs  from  these  definitions  of  VOT  in  two  important 
ways.  First,  for  voiceless  consonants,  he  measures  lag  not  from  CFO  to  VO, 
but  from  CFO  to  WO,  the  "sudden  onset  of  vertical  striations  [on  a 
spectrogram]  in  the  second  and  higher  formants."  Such  VOTs  would 
naturally  be  longer  than  Lisker  and  Abramson's  VOTs,  by  the  length  of  the 
VO- WO  segment  duration.  Second,  and  much  more  radically,  Klatt 
redefines  VOT  in  voiced  consonants  as  identical  to  that  in  voiceless 
consonants,  not  as  VO-CFO,  with  VO  preceding,  but  rather  as  CFO-VVO,  with 
WO  in  the  following  vowel.  Klatt  finds  this  justified  partly  by  the 
operational  difficulty  of  locating  VO  in  spectrograms  of  initial  voiced  stops, 
and  partly  by  the  phonological  analysis  (Lisker  & Abramson,  1964)  that 

needs  of  this  study. 

5Obviously,  CFO  and  other  such  label  names  are  newer  than  the  concepts  they 
represent,  and  will  not  be  found  in  most  publications. 


68 


presence  or  absence  of  prevoicing  is  not  contrastive  in  English.  This  analysis, 
tied  to  English,  tied  to  spectrographic  technology,  and  implying  two  non- 
equivalent kinds  of  voicing,  would  doubtless  have  caused  logical  and 
phonological  problems  if  pursued  too  far.  Also,  with  Klatt's  redefinition, 

VOT  is  best  understood  as  a segment  containing  post-release  burst  and 
aspiration  noise,  rather  than  a dimension  by  which  voicing  can  be  measured 
as  a gradable  quantity.  Still,  it  has  the  advantage  of  providing  a VOT  measure 
for  voiced  stops  which,  like  that  for  voiceless  stops,  is  usable  at  both  initial 
and  medial  positions,  whereas  Lisker  and  Abramson’s  VOT  for  voiced  stops 
was  limited  to  utterance  initial  position. 

Semiclaes  (1975-1976)  came  to  a different  resolution  when  faced  with 
the  analogous  problem  in  French.  Rather  than  redefining  VOT  to  include  the 
post  release  segment  regardless  of  voicing,  Semiclaes  split  off  the  voicing 
lead,  prevoicing  (" prevoisement "),  as  a cue  for  voiced  consonants  separate 
from  the  voicing  lag  (" delai  de  voisement ").  The  amplitude  and  duration  of 
the  burst  noise  during  voicing  lag  could  also  cue  for  either  voiced  or 
voiceless,  according  to  their  values.  Thus,  VOT  as  a whole  is  not  treated  as  a 
unified  concept  or  cue. 

In  light  of  these  3 very  different  treatments,  we  must  conclude  that 
VOT  is  a more  complicated,  less  unified  concept  than  is  usually  believed.  But 
how  is  that  important  for  CFO  and  consonant  duration? 

In  voiceless  consonants,  in  both  initial  and  medial  positions,  CFO 
begins  a segment  (Klatt's  VOT  or  Semiclaes'  voicing  lag)  which  includes 
aperiodic  noise,  the  content  of  which  must  be  attended  to  by  the  listener,  not 
only  for  the  voicing  distinction,  but  also  for  distinctions  of  place  and  manner 
(Lieberman  & Blumstein,  1988,  pp.  189-90  & 192-3).  For  some  languages, 
French  among  them,  this  segment  and,  by  implication,  its  CFO  starting  point 


69 


are  even  useful  for  final  stop  consonants,  because  they  are  typically  released 
(Delattre,  1951;  Laeufer,  1992).6 

For  voiced  consonants  on  the  other  hand,  the  situation  is  quite 
different.  Beyond  the  operational  (and  possibly  perceptual)  problem  in  this 
environment  of  locating  CFO  at  its  typical  low  amplitude,  the  segment  begun 
by  CFO  has  never  been  thought  to  contain  particularly  crucial  information  for 
the  voicing  decision.  Instead,  phonetic  voicing  during  the  consonantal 
closure  segment  which  CFO  ends  (Lisker  & Abramson's  voicing  lead  or 
Semiclaes'  prevoicing)  has  been  privileged  as  an  important  cue  to 
phonological  voice.  Moreover,  the  duration  of  this  segment  is  problematic, 
since  for  medial  and  released  final  voiced  consonants,  it  begins  at  consonantal 
closure,  while  at  initial  position  it  begins  at  VO  (at  least  for  Lisker  & 

Abramson  and  Semiclaes). 

Thus,  for  voiceless  consonants,  CFO  represents  a starting  point  for  a 
segment  with  high  importance,  where  content  and  duration  are  analogous  at 
initial  and  medial,  and  sometimes  final,  positions.  For  voiced  consonants, 
CFO  represents  an  ending  point  of  the  critical  segment  rather  than  a starting 
point,  and  though  that  segment's  content  is  analogous  (at  least  at  initial  and 
medial  positions),  its  duration  is  not  analogous  for  any  of  the  three  positions. 

Within  the  context  of  this  study,  this  makes  CFO  a poor  candidate  for 
an  endpoint  of  a duration  measure  which  is  intended  as  analogous  across 
voiced  and  voiceless  stops,  especially  when  CFO's  status  as  a landmark  in 
VOT  might  give  the  false  appearance  of  analogous  phonetic  structures  across 
initial,  medial,  and  final  positions. 


6Final  release  is  reported  to  be  less  common  in  English.  (Delattre,  1951,  p.  11; 
Flege  & Hillenbrand,  1987) 


70 


Accordingly,  this  study  will  not  use  VVT-CFO,  but  rather  VVT-VVO  as 
the  measure  of  consonant  duration.  This  segment  includes  the  articulatory 
closure  that  is  clearly  of  central  importance,  as  well  as  a consistent  portion  of 
the  transition  segments  leading  into  and  out  of  that  closure.  This  segment  is 
analogous  across  both  voicing  conditions,  and  any  comparison  of  these 
medial  durations  to  initial  or  final  conditions  can  preserve  that  analogy.  And 
finally,  this  segment  does  not  depend  on  the  questionable  analogy  of  bursts 
across  the  two  voicing  conditions. 

5.4  Intensity 

For  reasons  previously  explained  (section  4.2),  amplitude  measures 
were  desired  which  would  not  be  analogous  across  voiced  and  voiceless 
consonants.  Voiceless  stops  have  a prominent  burst-plus-aspiration  segment 
upon  release,  compared  to  which  that  of  voiced  stops  is  negligible.  Voiced 
stops  have  audible  voicing  through  most  or  all  of  their  closure,  compared  to 
which  that  of  voiceless  stops  is  negligible.  It  was  therefore  decided  that 
amplitudes  of  release  bursts  would  be  measured  in  voiceless  stops,  and  that  of 
closure  voicing  in  voiced  stops.7-  8 

All  intensity  measurements  in  this  experiment  are  relative,  since 
absolute  intensity  is  presumed  to  be  of  negligible  importance  for  phonemic 
perception  and  was  therefore  not  controlled.9 

7Recall  from  the  previous  chapter  that  the  term  "burst"  will  be  taken  to  refer 
to  the  entire  burst-plus-aspiration  segment.  In  terms  of  labels,  this  would 
optimally  be  the  segment  CFO-CFT,  but  since  CFT  was  not  used  in  this  study, 
CFO- VO  is  the  closest  approximation.  Flowever,  since  the  segment  is  never 
used  per  se,  the  distinction  is  immaterial. 

8Henceforth  the  terms  "voicebar"  and  "closure  voicing"  will  be  used 
interchangeably  for  the  phonetic  voicing  during  closure  of  voiced  stops. 

9In  natural  settings,  amplitude  is  confounded  with  signal /noise  ratio,  but  in 
low  noise,  listeners  have  no  trouble  perceiving  either  near  or  distant  talkers. 


71 


The  Signalyze  analysis  software  provides  a means  to  measure  relative 
amplitude  by  specifying  the  two  points  whose  difference  is  to  be  calculated  on 
an  RMS  (root  mean  squared)  amplitude  curve.  The  RMS  curve  is  derived 
from  the  acoustic  speech  signal  using  a window  of  a specified  length  which  is 
then  scrolled  through  the  entire  signal.  The  RMS  curve  is  temporally  aligned 
with  the  acoustic  signal  from  which  it  is  derived. 

Three  questions  must  be  answered  in  order  to  arrive  at  operational 
procedures  for  measuring  amplitudes.  First,  how  long  a window  should  be 
used  to  generate  the  RMS  curve?  Second,  which  points  should  be  chosen  to 
represent  burst  and  voicebar,  respectively?  Last,  what  is  a reasonably  stable 
landmark  against  which  to  judge  burst  and  voicebar  amplitudes? 

RMS  Amplitude  Curves 

Regarding  window  length,  after  examination  of  RMS  curves  of  many 
tokens  using  several  different  window  lengths,  a window  of  10  ms  was 
selected  as  the  best  option.  A window  of  that  length  was  shorter  than  the 
shortest  bursts  (CFO-VO  for  / ta/,  which  was  approximately  15-20  ms),  and 
typically  rendered  a single  peak  within  the  burst  segment.  These 
characteristics  with  respect  to  the  voiceless  tokens  were  advantageous  enough 
to  outweigh  the  minor  disadvantage  of  a slight  rightward  shift  of  the  RMS 
amplitude  peak,  which  resulted  from  the  window  being  just  long  enough  to 
take  in  the  signal  excursion  maxima  of  both  the  burst  proper  and  the 
following  aspiration.  As  for  the  voiced  tokens,  the  window  was  long  enough 
to  take  in  approximately  3 pitch  periods,  and  this  substantially  reduced  pitch 
synchronous  variation  in  the  RMS  curve  during  the  voiced  stretches.  Thus, 


whether  whispering  or  shouting.  See  also  Sommers  et  al.  (1992)  for 
experimental  support. 


72 


for  both  voiced  and  voiceless  tokens,  a 10  ms  window  was  a good  choice  for 
generating  RMS  amplitude  curves. 

Burst  and  Voicebar  Measurements 

The  next  question  is  what  points  should  represent  the  burst  and 
voicebar  amplitudes.  The  discussion  above  alluded  to  the  fact  that  a window 
length  of  10  ms  generally  gave  a single  peak  during  the  burst,  so  it  seemed 
natural  to  choose  the  RMS  peak  within  the  burst  (i.e.,  CFO- VO)  to  represent 
the  burst  amplitude.10 

It  is  doubtful  that  a point  representation  of  voicebar  amplitude  is 
justifiable  from  either  a perceptual  or  a linguistic  point  of  view.  Closure  and 
closure  voicing  are  both  long  enough  to  suggest  that  listeners  may  analyze 
and  weigh  some  kind  of  constituent  parts  (possibilities  include  onset  and 
termination  amplitude,  attenuation  slope  or  configuration,  voicing  duration 
above  some  threshold,  etc.).  This  possibility  gains  further  importance  when 
the  problem  of  the  phonological  status  of  consonant  length  (e.g.,  in  Italian  or 
Estonian)  or  closure  itself  (e.g.,  in  Spanish)  are  considered.  However,  analysis 
of  these  possibilities  was  beyond  the  scope  of  this  study,  and 
operationalization  necessitated  that  a point  representation  be  chosen. 

Consider  the  reasonable  possibilities.  A point  near  the  beginning  of 
closure  (at  or  near  WT)  was  likely  to  be  highly  influenced  by  the  amplitude 
of  the  preceding  vowel.  A point  near  the  end  of  closure  (at  or  near  WO) 
might  be  influenced  by  the  following  vowel,  or,  given  the  problem  of 
maintaining  voicing,  by  the  duration  of  closure.  It  might  be  nil,  neutralized 

10See  also  Repp  (1979)  for  experimental  evidence  that  burst  (per  se)  and 
aspiration  combine  to  affect  perception  of  voice.  This  strengthens  the  choice 
of  an  RMS  peak  sensitive  to  both  as  representative  of  burst  as  defined  in  this 
experiment. 


73 


by  floor  effects,  or  alternatively,  it  might  record  amplitude  variations  which 
are  actually  below  effective  perceptual  thresholds.  Furthermore,  neither 
endpoint  is  likely  to  be  representative  of  the  actual  amplitude  slope  (or 
configuration)  during  closure  itself. 

Given  these  considerations,  it  was  decided  to  use  a point  at  the  middle 
of  closure,  the  midpoint  between  VVT  and  VVO,  to  represent  voicebar 
amplitude,  since  it  was  clearly  representative  of  the  amplitude  slope  during 
closure  itself,  and  might  be  influenced  from  both  ends  of  the  closure  as  well. 

Relative  Amplitude  Reference  Point 

The  last  choice  is  the  reference  point  against  which  to  gauge  the  burst 
and  voicebar.  The  preference  for  maximal  generalizability  prompts  analysis 
both  within  and  across  languages.  First,  as  discussed  above  in  section  5.3, 
bursts  are  more  similar  in  initial  and  medial  positions  than  in  final  position. 
Second,  voicebar  is  vulnerable  to  elision  in  both  initial  and  final  positions, 
with  patterns  varying  by  language.  Last,  the  syllable  structure  that  is 
overwhelmingly  most  common  across  the  world's  languages,  and  within 
many  if  not  most  of  them,  is  the  CV  syllable.  This  set  of  facts  implies  that  the 
most  useful  site  to  fix  a reference  point  is  in  the  vowel  following  our  medial 
target  consonant.  It  would  be  usable  in  both  voiced  and  voiceless  medials  and 
in  voiceless  (and  some  voiced)  initials  in  the  vast  majority  of  the  worlds 
languages. 

Two  evident  candidates  are  WO  and  the  peak  amplitude  of  the  vowel. 
Unfortunately,  WO  occurs  in  an  area  where  amplitude  is  climbing  steeply, 
so  a slight  change  in  WO  placement  (e.g.,  by  one,  or  even  one-half  a glottal 
cycle)  would  result  in  a (relatively)  large  change  in  amplitude,  clearly  an 
unacceptable  characteristic  for  a reference  point. 


74 


To  judge  the  appropriateness  of  the  vowel's  amplitude  peak  as  a 
reference,  we  must  consider  the  various  configurations  of  the  vowel's 
amplitude  curves.  Obviously,  the  infallible  generalization  is  that  amplitude 
increases  during  the  vowel  onset,  attains  a relatively  stable  plateau  during  the 
full  vowel,  then  decreases  during  offset.  Commonly  (though  less  so  after 
voiced  consonants)  there  is  an  early  "hump"  in  the  curve,  with  a faster  climb 
and  a slight  drop  in  amplitude  before  the  climb  resumes  to  the  plateau.  The 
plateau  itself  can  have  aspects  of  three  "polar"  configurations:  It  may  peak 
early,  then  drift  gently  into  the  offset,  or  it  may  continue  a slow  climb  to  a late 
peak.  It  may  also  have  two  peaks,  one  early  and  one  late,  either  of  which  may 
be  higher,  with  the  plateau  slung  between  them.  Naturally,  flat  and  mid- 
peaked  plateaus  also  occur. 

It  seems  unlikely  that  human  listeners  would  go  to  the  perceptual 
trouble  of  seeking  out  the  vowel's  precise  amplitude  maximum  in  order  to 
retroactively  judge  a consonantal  amplitude,  given  the  wide  time  variation 
in  its  occurrence.  Vowel  peak  amplitude  is  a thus  a poor  choice  for  an 
amplitude  reference  point. 

A more  likely  perceptual  technique  would  be  comparison  of  the 
consonant  with  the  contiguous  portion  of  the  vowel,  perhaps  the  slope  plus 
part  of  the  plateau.  A reference  point  early  in  the  plateau  would  be  relatively 
stable,  yet  reasonably  likely  to  be  influenced  by  the  early  indicators  of  vowel 
amplitude:  onset  slope,  plateau  amplitude,  and  (when  present)  the  "hump" 
mentioned  above.  The  point  WO  + 50  ms  was  chosen,  since  it  is  within  the 
early  part  of  the  plateau  but  beyond  the  "hump." 


75 


Final  Amplitude  Measurement  Technique 

So,  to  resume,  amplitude  was  measured  on  an  RMS  curve  generated 
from  the  speech  signal  using  a 10  ms  integration  window.  Burst  amplitude  in 
voiceless  consonants  was  represented  by  the  peak  in  the  CFO- VO  segment, 
and  voicebar  amplitude  in  voiced  consonants  was  represented  by  the 
midpoint  between  VVT  and  VVO.  The  amplitude  measurements  were  made 
relative  to  the  VVO  + 50  ms  point  in  the  following  vowel. 

5.5  Label  Verification 

Though  great  care  was  taken  to  get  the  labeling  right  the  first  time,  it 
was  expected  that  errors  would  nonetheless  have  crept  in,  since  the  labeling 
was  done  by  hand.  Before  any  conclusions  were  drawn,  statistical  methods 
were  used  to  find  labeling  inconsistencies  in  those  events  with  the  greatest 
susceptibility  to  judgment  errors,  VVO  and  WT. 

The  durations  of  the  consonant  and  the  surrounding  vowels  (i.e.,  VI, 

C,  and  V2)  were  estimated  from  WO  and  VVT  labels  in  each  token,  and  the 
durations  were  used  with  the  graphics  routines  of  the  DataDesk  statistical 
package  to  generate  3-dimensional  rotating  plots.  Each  point  in  the  data  cloud 
represented  one  token,  and  the  point  was  affixed  at  the  coordinates 
appropriate  for  its  VI,  C,  and  V2  duration.  The  plot  was  visualized  and  spun, 
and  any  outlying  tokens  were  identified. 

The  outlier  tokens  were  relabeled  blindly,  without  any  reference  to  the 
preexisting  labels  and  then  the  2 sets  of  labels  were  compared.  Differences  in 
label  locations  were  examined,  and  appropriate  adjustments  were  made.  In 
the  9 [t]s  and  6 [d]s  checked,  no  gross  errors  were  found,  although  label 


76 


locations  in  8 and  5 tokens  respectively  were  slightly  modified.  The  labeling 
differences  usually  involved  slow  changes,  and  the  inevitable  human 
variability  in  judging  amplitude  and  spectral  thresholds  along  the 
dimensions  concerned.  For  instance,  as  previously  mentioned,  in  setting  the 
VT  label  as  voicing  rolls  off  in  [t],  the  decision  that  the  periodicity  has 
degenerated  into  mere  background  noise  is  fairly  arbitrary.  Even  after  the 
adjustments,  the  tokens  remained  outliers  in  the  plot  described  above,  so  it 
was  concluded  that  any  further  errors  were  unlikely  to  have  systematically 
affected  the  duration  distribution  of  any  segment  definable  using  the  labels. 

Note  that  all  intensity  measurements  are  point  comparisons  of  2 points 
which  are  not  labels.  The  WO  + 50  ms  reference  point  for  both  voiced  and 
voiceless  is  directly  determined  from  label  locations,  as  is  the  mid-C  point 
representing  voicebar  amplitude.  The  burst  peak  is  simply  read  from  the 
RMS  amplitude  curve.  None  of  these  points  are  subject  to  judgment  of  any 
sort.  Therefore,  though  some  kind  of  error  is  theoretically  possible,  no  special 
verification  process  is  indicated  for  the  intensity  measurements. 

5.6  Final  Cue  Definitions  (Recapitulation! 

To  recapitulate  the  cues  used  in  this  study,  as  operationalized: 

A system  of  labels  was  applied  to  landmark  events  in  the  tokens  of  the 
corpus.  The  labels  used  were 

VVT  CFO  VO  WO 

for  voiceless  consonants,  and 

VVT  CFO  WO 

for  voiced.11 

11As  a reminder,  these  labels  abbreviate  the  following  terms:  vocalic  voice 
termination,  consonant  frication  onset,  voice  onset,  vocalic  voice  onset. 


77 


The  cue  of  consonant  duration  is  presumed  to  be  analogous  across 
voiced  and  voiceless  consonants,  and  will  be  represented  by  the  duration  of 
the  segment  VVT-VVO,  which  is  measured  in  both  conditions. 

Two  amplitude  cues  are  to  be  examined  which  are  not  analogous  across 
the  two  voicing  conditions.  Burst  amplitude  will  be  studied  for  voiceless 
consonants  and  will  be  represented  by  the  peak  amplitude  in  the  segment 
CFO- VO.  Voicebar  amplitude  will  be  studied  for  voiced  consonants,  and  will 
be  represented  by  the  amplitude  at  the  point  midway  between  VVT  and  WO. 
Relative  amplitude  of  these  points  will  be  determined  by  comparison  with  a 
reference  amplitude  50  ms  after  the  WO  of  the  following  vowel,  for  both 
voiced  and  voiceless  conditions.  All  these  amplitudes  will  be  read  from  an 
RMS  amplitude  curve  derived  from  the  speech  signal  using  a 10  ms 
integration  window.12 


5.7  Results 

Table  5.2  reports  the  means  and  standard  deviations  of  the  phonetic 
measurements.  Figure  5.2  represents  the  data  graphically.  The  durations  are 
measured  in  milliseconds,  and  the  amplitudes  are  measured  in  decibels  down 
from  the  reference  point  (WO  + 50  ms). 

Comparison  with  other  studies 

The  measurement  protocol  for  this  experiment  is  fairly  unusual.  This 
presents  advantages  and  disadvantages  regarding  comparison  of  these  results 
with  others  from  the  literature. 


12Jean-Luc  Schwartz  contributed  to  discussion  of  the  options  in 
operationalizing  the  cue  definitions. 


78 


Table  5.2 

Means  and  Standard  Deviations  of  Measured  Cues. 


Vowel 

Environment 

Consonant 

[d] 

[t] 

Consonant 

Duration 

mean  std.  dev. 

mean 

std.  dev. 

[i] 

99.680  (14.980) 

156.91 

(12.292) 

[a] 

65.522  (10.761) 

120.32 

(7.9042) 

[u] 

88.600  (11.920) 

139.54 

(10.430) 

Voicing 
Amplitude 
mean  std.  dev. 


[i] 

15.796  (3.6241) 

[a] 

12.262  (1.6750) 

[u] 

10.415  (1.1345) 

Burst 

Amplitude 
mean  std.  dev. 

[i] 

13.600  (2.6567) 

[a] 

15.479  (2.5177) 

[u] 

6.1183  (1.9449) 

Vowel  Context  Vowel  Context  Vowel  Context 


79 


■ [d]  • [t] 


Figure  5.2.  Measured  cue  distributions.  The  symbols  represent  the  means 
and  the  error  bars  one  standard  deviation  above  and  below. 


80 


This  protocol  depends  crucially  on  the  labeling  of  instantaneous  signal 
events  as  landmarks,  which  has  led  to  the  explicit  rigorous  definition  of  those 
signal  events.  The  labeled  events  are  then  used  as  needed,  as  beginning  or 
end  points  of  the  segment  whose  duration  or  content  is  the  cue  of  interest.  In 
most  other  studies  in  the  literature,  the  cues  of  interest  are  addressed  as 
directly  as  possible,  so  the  events  that  delimit  them  are  either  implicit  or 
described  only  briefly  and  in  passing. 

Also  unusual  in  this  study  is  the  use  of  multiple  signal  representations 
in  the  labeling  of  events.  For  instance,  the  VVT  event  was  labeled  through 
reference  to  the  full  speech  signal,  its  spectrogram,  a high-pass  filtration  of  the 
speech  signal,  and  an  amplitude  curve  of  the  high-pass  signal.  Many  well 
known  studies  are  based  on  examination  of  a single  representation,  typically 
either  the  signal  itself  or  its  spectrogram. 

Lastly,  it  may  be  noted  that  labeling  in  this  study  was  accomplished 
with  minimal  reference  to  audition.  Since  the  goal  of  Experiment  2 is 
precisely  the  measurement  of  cues'  perceptual  importance,  the  logical 
circularity  of  using  audition  to  locate  the  cues'  defining  events  would  have 
been  an  undesirable  risk.  Numerous  studies  have  used  the  experimenters 
ears  as  a crucial  tool  in  the  experiment. 

By  way  of  contrast,  consider  the  following  methods  from  the  literature: 

Each  of  the  digital  speech  files  was  examined  using  graphical  displays  of 
the  waveform  and  interactive  listening  under  earphones.  Locations  in 
each  file  at  which  the  speech  signal  started  and  ended  were  determined 
(by  visual  inspection  with  concurrent  listening) 

The  (actual)  segments  of  the  readings  were  identified  and  marked  by 
means  of  the  same  manual-auditory  procedures  used  to  measure  the 
breath  groups.  In  essence,  a given  phoneme  was  located  in  the  signal 
by  studying  a computer-graphics  waveform  display,  and 
simultaneously  listening  to  the  signal  under  high-quality  earphones. 
(Crystal  & House,  1982,  pp.  707  & 710) 


81 


Stop  closure  durations  were  measured  from  storage  oscilloscope 
(Tektronix  5108B)  displays  of  the  speech  waveform.  Stop  closure 
duration  was  defined  as  the  interval  between  the  last  glottal  pulse  for 
the  preceding  vowel  and  the  burst  of  the  following  stop. 

(Stathopoulos  & Weismer,  1983,  p.  396) 

Clearly,  the  increased  power  and  ease  of  use  of  modern  signal  processing 

software  has  engendered  methodological  improvements  in  phonetic 

measurement,  and  this  experiment  has  profited  from  them. 

However,  notwithstanding  their  presumably  improved  reliability, 

these  measurements  are  probably  not  directly  comparable  to  most  others  in 

the  literature,  due  to  the  particular  operational  cue  definitions  used  here. 

Consider  for  example  the  segment  measured  in  Crystal  and  House  (1982): 

It  is  worth  noting  that  in  the  measurement  of  stops  (and  affricates)  the 
duration  of  the  hold  portion  of  the  production  was  always  measured, 
as  well  as  that  of  the  release,  whenever  the  latter  consisted  of  a plosion 
(that  is,  a burst  of  noise,  with  or  without  frication).  In  the  case  of  stops 
with  nonplosive  releases — that  is,  nasal  releases,  lateral  releases,  etc. — 
the  (so-called)  release  portion  of  the  signal  was  included  in  the 
following  segment,  (pp.  710-711) 

From  this  description  we  can  discern  several  things:  1)  The  measures  of 
voiced  and  voiceless  consonants  are  not  parallel,  due  to  the  treatment  of  the 
release.  2)  The  measures  of  voiced  and  voiceless  consonants  are  probably  not 
equally  accurate,  since  the  closures  of  voiced  consonants  are  likely  to  be  less 
abrupt  and  discernible.  3)  This  corpus  includes  measurements  of  consonants 
in  clusters.  4)  The  measurements  of  voiceless  consonants  in  isolation  and  in 
clusters  are  not  parallel,  since  nasal  and  lateral  releases  are  excluded  from  the 
latter.  These  are  all  clear  and  important  differences  with  the  corpus  and 
procedure  leading  to  the  results  reported  above  for  Exp.  1. 

Since  there  is  such  a striking  difference  between  the  corpus  and 
methods  of  Exp.  1 and  those  of  such  standard  investigations  as  Crystal  and 


82 


House  (1982),  and  since  the  entire  justification  of  Exp.  1 was  to  use  its  results 
as  input  to  Exp.  2 rather  than  as  a general  survey  of  acoustic  facts,  there  will  be 
no  detailed  comparison  of  these  results  to  those  in  the  literature,  as  none  is 
warranted 

Application  to  Experiment  2 

Typically  the  intent  of  a phonetic  survey  of  this  sort  is  to  document  the 
central  tendency  among  the  sets  measured  by  reporting  the  mean  measures, 
as  in  the  first  column  for  each  cue  in  Table  5.2.  That  is  not  the  thrust  of  this 
survey. 

Recall  from  Chapter  3 that  speakers  are  presumed  to  have  a mental 
representation  of  the  expected  measurement  and  expected  variability  of  each 
cue  used  in  speech  perception.  These  expectations  can  be  reasonably  modeled 
by  the  means  and  standard  deviations  from  a phonetic  survey,  as  carried  out 
in  Exp.  1 and  reported  in  the  table  and  figure  above.  Chapter  3 continues  with 
the  claim  that  equally  expected  cue  measures  are  equal  signal  strengths  for  the 
cues  in  question.  Within  the  model,  this  means  that  cues  with  equal  z-scores 
have  equal  signal  strengths. 

The  crucial  claim  in  Chapter  3,  the  claim  that  necessitated  Exp.  1,  is  that 
equivalent  changes  can  be  made  in  different  cues  if  the  modifications  are 
equivalent  with  regard  to  the  expected  variability.  In  model  terms,  this 
means  that  a modification  that  changes  its  cue  by  4 standard  deviations  is 
equivalent  to  any  other  change  of  4 standard  deviations  (in  the  analogous 
direction13)  in  any  other  cue. 


13The  analogous  direction  is  necessary  to  preserve  the  effect  of  the  change 
relative  to  the  opposing  phoneme.  See  the  discussion  in  the  following 
chapter. 


83 


Exp.  1 was  performed  in  order  to  determine  the  standard  deviations 
reported  in  Table  5.2,  not  the  means,  because  the  standard  deviations  will  be 
used  to  specify  the  modifications  for  the  tokens  to  be  used  as  stimuli  in  the 
perceptual  experiment  that  is  Exp.  2.  Thus,  the  fundamental  result  of  Exp.  1 is 
the  standard  deviations  in  parentheses  in  Table  5.2,  as  represented  by  the 
length  of  the  bars  in  Figure  5.2. 


CHAPTER  6 

EXPERIMENT  2:  PERCEPTION  TEST  PROCEDURES 

The  overall  plan  of  this  research  is  to  use  cue-internal  criteria  to  design 
analogous  modifications  in  several  different  cues.  A series  of  stimulus 
tokens  is  then  created  according  to  the  design  and  played  to  subjects  in 
perceptual  tests.  The  test  results  are  used  to  quantify  the  perceptual 
importance  of  the  various  cues,  which  can  then  be  compared  autonomously. 
Experiment  1,  the  phonetic  survey  reported  in  the  last  chapter,  established  the 
cues  and  the  requisite  cue-internal  criteria.  Experiment  2 concerns  the 
perceptual  tests.  This  chapter  reports  its  methods,  the  next  chapter  its  results 

6.1  Analogous  Weakening  Pattern  Across  Cues 

The  last  chapter  ended  with  the  statement  that  modifications  in  two 
different  cues  could  be  taken  as  equivalent  if  the  modifications  were  of  equal 
size,  as  normalized  to  their  own  cue's  distribution,  and  if  they  were  in  an 
analogous  direction.  This  notion  of  "analogous  direction"  bears  explanation. 

Recall  that  a cue's  signaling  function  is  dependent  on  its  acoustic 
contrast  with  the  appropriate  opposing  phoneme,  and  that  multiple  cues 
contribute  to  the  fixing  of  each  distinctive  feature,  and  thus  to  phoneme 
recognition.  The  redundant  coding  with  multiple  cues  ensures  robust 
recognition  approaching  100%  under  natural  conditions.1  Still,  if  enough 


talkers  can  adjust  both  to  the  varied  natural  conditions  and  to  their 
perceived  recognition  requirements  by  hyper-  or  hypoarticulation. 


84 


85 


cues  are  weak  enough,  the  phone  will  tend  to  be  recognized  less  than  usual  as 
the  talker  s intended  phoneme,  and  more  than  usual  as  the  opposing 
phoneme.  This  experiment  will  artificially  weaken  the  selected  cues  to 
provoke  such  misperceptions. 

Why  only  weaken  the  cues?  Why  not  weaken  half  and  strengthen 
half?  There  are  two  reasons.  First,  the  robust  recognizability  of  natural 
speech  imposes  a ceiling  effect  on  phonemic  perception.  If  a natural  token  is 
recognized  almost  perfectly,  it  is  hard  to  verify  improved  performance  in 
response  to  a strengthened  token.  Methods  such  as  signal  degradation  and 
divided  attention  tasks  are  available  for  lowering  performance  overall  to 
allow  room  for  improved  performance,  but  such  methods  would  have 
introduced  technical  and  theoretical  complications  that  were  deemed 
unwarranted  here. 

Second,  the  work  of  Joanne  Miller  and  her  colleagues  (cf.  Miller,  1994, 
discussed  briefly  above  in  section  2.5)  suggests  that  tokens  with  strengthened 
cues  cease  at  some  point  to  be  perceived  simply  as  strong  versions  of  the 
phoneme.  Using  a VOT-based  series  representing  /bi/-  /pi/  and  continuing 
on  into  "*/pi/,"  she  finds  that  tokens  with  overly  long  VOTs  are  no  longer 
accepted  as  "good"  versions  of  /pi/.  The  implications  of  this  phenomenon 
have  just  begun  to  be  explored,  so  again,  strengthening  of  cues  introduces 
complications  that  are  unwarranted  within  this  investigation. 

Furthermore,  using  analogous  modification  direction  — i.e., 
weakening  — for  all  cues  leads  to  greater  methodological  simplicity  in  the 
resulting  cue  strength  comparisons.  It  is  possible  to  imagine  experiment 
designs  which  test  the  "balance"  in  two  cues,  by  strengthening  one  and 
weakening  another  in  the  same  token,  then  checking  the  perceptual  result. 
This  not  only  risks  encountering  the  same  ceiling  effects  discussed  above,  but 


86 


als°  aggravates  any  possible  problem  stemming  from  listener  knowledge  of 
strict  covariation  across  cues,  due  to  the  physics  or  physiology  of  speech 
production.  There  is  as  yet  no  compelling  reason  to  run  those  risks. 

For  the  first  test  of  this  new  measurement  technique  it  is  thus 
preferable  from  a methodological  standpoint  to  weaken  all  the  target  cues, 
thereby  reducing  contrast  with  the  opposing  phonemes  and  ensuring 
measurable  changes  in  performance,  and  to  compare  the  resulting 
misperceptions  across  the  four  cues  at  the  equivalent  levels  of  weakening 
determined  in  Experiment  1. 

6.2  Factorial  Design  within  Cues 

The  main  experimental  variable  is  clearly  magnitude  of  contrast 
reduction  (weakening),  since  the  more  each  cue  is  weakened,  the  more 
misperceptions  can  be  expected,  and  since  these  results  will  serve  as  the  basis 
of  comparison  of  the  different  cues.  Flowever,  since  natural  tokens  are  to  be 
used,  each  cue's  original  strength  (modeled  as  its  z-score  within  its  natural 
distribution  in  this  corpus)  can  also  be  expected  to  exert  an  influence  over  the 
probability  of  misperception.  Therefore  a factorial  design  was  planned,  with 
the  2 factors  being  the  edited  level  of  contrast  reduction  and  the  token's 
original  level  of  contrast. 

A pilot  study  of  contrast  reduction  had  shown  two  things.  First, 
contrast  reduction  of  2 standard  deviations  only  produced  a negligible 
increase  in  misperception,  and  even  at  5 standard  deviations  the  proportion 
of  misperceptions  was  quite  low.  Second,  the  tokens  did  not  sound  at  all 
unnatural,  even  those  at  5 standard  deviations  of  contrast  reduction.  The 
decision  was  made  to  use  three  levels  of  this  factor,  set  at  2,  4,  and  6 standard 
deviations  of  contrast  reduction. 


87 


Treatment  of  the  second  factor,  original  contrast  level,  was  fairly 
simple  in  concept,  but  somewhat  tricky  to  design  in  practice.  An  error  in  the 
technique  of  the  above  mentioned  pilot  study  had  shown  that  performance 
differences  could  be  neutralized  by  contrast  leveling large  reductions  of 
high  original  contrast  along  with  small  reductions  of  low  original  contrast. 
This  pattern  apparently  tended  to  compress  the  stimulus  distribution  around 
some  value  slightly  displaced  from  the  cue's  original  distribution.  The 
similarity  of  perceptual  results  showed  the  need  to  spread  the  tokens  out 
along  the  cue's  acoustic  parameter  in  order  to  get  perceptual  differences.  The 
decision  was  made  to  treat  original  level  of  contrast  as  a factor,  and  to  use 
three  levels  of  the  factor.  Along  with  the  first  factor,  this  resulted  in  a 3x3 
factorial  design. 

To  understand  the  overall  pattern  of  the  stimulus  array,  recall  first  that 
each  of  the  word  classes  (/didit,  dadat,  dudut,  ditit,  datat,  dutut/)  is 
represented  by  from  23  to  25  tokens.  Two  cues  are  to  be  modified  on  each  side 
of  the  voicing  contrast,  and  thus,  in  each  word  class.  A token  will  only  have 
one  cue  (at  most)  modified,  so  the  tokens  in  each  word  class  will  be  split 
between  the  two  cues. 

Since  each  token  was  measured  for  both  of  the  cues,  the  measurements 
effectively  rank  each  token's  original  contrast  level  for  each  of  the  two  cues. 
To  be  conservative,  the  two  tokens  with  the  lowest  original  contrast  for  each 
cue  were  set  aside  as  "boundary  tokens”  and  presumed  to  be  already  too 
susceptible  of  misperception.  The  remaining  tokens  were  grouped  in  thirds 
by  rank  for  each  cue,  and  the  groups  were  termed  low,  medium,  and  high 
original  contrast.  One  of  the  two  cues  was  chosen,  then  one  "low"  token  was 
randomly  chosen  for  each  of  the  three  contrast  reduction  levels  (2,  4,  and  6 
standard  deviations).  Then  the  tokens  were  chosen  for  contrast  reduction  out 


88 


of  the  medium  and  high  original  contrast  groups,  to  complete  the  3x3 
factorial  array  of  stimuli  for  that  cue.  The  same  process  was  repeated  to  fill  in 
the  3x3  array  for  the  other  cue.  (Adjustment  was  sometimes  necessary  when 
the  choices  for  the  first  cue  left  less  than  three  tokens  in  the  second  cue's 
original  contrast  groups.)  Each  cue  thus  claimed  11  tokens  from  each  word 
group:  9 for  the  3x3  factorial  array,  plus  2 boundary  tokens.  All  tokens  were 
used  as  perceptual  stimuli,  the  factorial  tokens  as  modified,  and  the  boundary 
tokens  plus  any  spares  without  modification. 

As  an  illustration,  consider  the  disposition  of  the  24  tokens  of  [dutut]. 
The  two  cues  to  be  investigated  in  /t/  are  consonant  duration  and  burst 
amplitude.  First,  the  two  tokens  with  the  shortest  durations  and  the  two  with 
the  softest  bursts  were  set  aside  as  boundary  tokens  — 4 tokens  down.  Next 
the  low,  medium,  and  high  duration  contrast  groups  were  determined, 
corresponding  to  the  shorter,  middle,  and  longer  thirds  of  the  remaining  [t]s. 
From  the  low  contrast  group,  a token  was  randomly  selected  to  be  shortened 
by  2 standard  deviations,  another  by  4 standard  deviations,  and  a last  by  6 
standard  deviations  — 3 more  tokens  down.  Three  tokens  were  selected  from 
the  medium  contrast  group  for  similar  shortening,  and  likewise  for  the  high 
contrast  group  — 6 more  tokens  down.  Then  the  same  grouping  and 
selection  process  was  done  for  the  burst  amplitude  cue  — 9 more  tokens 
down.  The  2 remaining  tokens  were  not  modified. 

With  this  plan  explained,  we  can  now  state  the  full  composition  of  the 
corpus  of  the  perceptual  tests.  Each  of  the  6 word  classes  will  have  the 
factorial  array  for  each  of  its  two  cues,  the  boundary  tokens  for  each  of  the  two 
cues,  and  a remaining  few  unmodified  tokens.  The  numerical  breakdown  is 
shown  in  Table  6.1.  Table  B.7  in  Appendix  B shows  the  token-by-token 
composition  of  the  array  for  each  of  the  four  cues. 


89 


Table  6.1 

Classification  of  Perceptual  Stimuli 


Token 

Function 

Word  Classes 

/ didit/ 

/ dadat/ 

/ dudut/ 

/ ditit/ 

/ datat/ 

/ dutut/ 

Factorial 
Array  1 

9 

9 

9 

9 

9 

9 

Factorial 
Array  2 

9 

9 

9 

9 

9 

9 

Boundary 

Tokens 

4 

3* 

4 

4 

4 

4 

Unmodified 

Extras 

3 

2 

3 

1 

3 

2 

Tokens  in 
Class 

Grand  Total 

25 

23 

25 

23 

25 

24 

145 

* The  / dadat/ word  class  has  one  fewer  boundary  tokens  than  the  other  classes 
because  the  token  "daOl"  was  a boundary  token  for  both  cues,  /d/  duration 
and  /d/  closure  voicing  amplitude. 


6.3  Calculating  Sensitivities  with  the  Stimulus  Array 

Consider  how  a stimulus  array  constructed  as  just  described  can  be  used 
to  calculate  the  d'  sensitivities  which  are  the  goal  of  this  study. 

Recall  (section  4.1)  that  the  edited  contrast  reduction  is  the  signal  to  be 
detected,  and  a hit  is  counted  each  time  a token  is  misperceived  in  accord 
with  the  weakened  contrast,  for  instance  when  a shortened  [t]  is  perceived  as 
"D."  By  opposition,  the  unedited  tokens  are  noise,  and  a false  alarm  is 
counted  when  a token  is  misperceived  in  the  absence  of  contrast  reduction, 
for  instance  when  a unedited  [t]  is  perceived  as  "D."  The  unedited  boundary 


90 


tokens  are  specific  to  each  cue  and  word  class,  and  responses  to  them  provide 
a false  alarm  rate  which  is  analogous  across  conditions.  The  three  different 
levels  of  the  two  factors  are  three  different  signal  strengths  for  each  cue,  and 
responses  to  the  three  signal  strengths  provide  hit  rates  which  are  also 
analogous  across  cue  and  word  class  conditions. 

A d is  calculated  by  finding  the  difference  between  the  z-transforms  of 
a hit  rate  and  a false  alarm  rate.  Thus,  using  the  single  false  alarm  rate  and 
the  three  hit  rates,  we  can  calculate  a d'  for  each  of  the  three  signal  strengths 
on  each  factor. 

This  pattern  of  d’  calculations  is  very  closely  analogous  to  that 
described  in  Macmillan  and  Creelman  (1991,  Chap.  9)  as  "cumulative”  d’. 
They  describe  a "two  response  classification"  experiment  (hypothetical,  but 
based  on  Fitch  et  al.,  1980)  where  varying  amounts  of  silence  were  spliced  in 
between  the  / s/  and  /l/  in  a token  of  "slit,"  forming  a stimulus  series  ranging 
from  "slit"  to  "split."  These  stimuli  are  played  individually  and  classified  by 
subjects  into  one  of  the  two  word  classes.  Using  the  response  proportions,  d' 
values  can  be  calculated  in  either  of  two  ways:  directly  by  comparing  endpoint 
responses  (i.e.,  responses  to  unmodified  "slit")  to  those  of  each  successive 
token,  or  indirectly  by  summing  the  d’  values  of  successive  adjacent  stimulus 
pairs.2  Macmillan  and  Creelman  note  that  cumulative  d'  is  only  valid  for 
measuring  perceptual  distances  between  stimuli  that  differ  on  a single 
perceptual  dimension.3 

2The  authors  pointedly  avoid  the  standard  SDT  terms  (signal,  noise,  hit,  etc.) 
for  this  experiment  design,  preferring  the  more  neutral  designations 
"endpoint"  and  a stimulus  series  number.  This  is  appropriate,  since  in  much 
speech  research,  there  is  no  difference  in  kind  between  the  members  of  such  a 
stimulus  series. 

JIt  is  also  worth  noting  that  the  difference  between  the  direct  and  indirect 
calculations  of  cumulative  d'  might  serve  as  the  basis  for  a test  of  perceptual 
one-dimensionality. 


91 


The  analogy  with  this  study  is  clear.  The  endpoint  stimuli  here  are  the 
unmodified  boundary  tokens,  and  the  other  stimuli  are  set  to  specified  points 
along  the  appropriate  cue's  single  dimension  in  the  direction  of  opposing 
phoneme.  Cumulative  d'  distances  can  be  calculated  between  the  endpoint 
stimuli  and  any  other  stimuli  along  the  dimension  of  that  cue. 

Several  differences  with  this  study  are  also  clear,  but  they  can  be  shown 
to  be  appropriate,  given  the  goals  and  design  of  this  project.  First,  the 
hypothetical  study  uses  a single  acoustic  phenomenon,  varies  it  in  large  steps, 
and  takes  the  stimulus  series  all  the  way  to  an  unmistakably  opposite  percept 
(i.e.,  96%  "split"  responses).  This  study  uses  4 cues  (2  on  each  side  of  the 
contrast),  varies  them  in  small  steps,  and  only  2 of  145  stimuli  have  over  a 
20%  chance  of  the  opposite  percept.  They  are  varied  only  over  a short 
distance  so  that  the  same  techniques  can  be  used  for  both  monomodal  and 
bimodal  cues  (see  section  3.3).  They  are  varied  in  small  steps  so  that  the 
normalized  editing  pattern  will  continue  to  sound  natural  across  different 
cues,  environments,  and  languages.  And  four  cues  are  used  rather  than  a 
single  acoustic  parameter  in  order  to  allow  comparisons  appropriate  to  the 
linguistic  goals  of  this  study. 

Second,  the  hypothetical  study  uses  a single  repeated  stimulus  at  each 
setting,  while  this  study  (as  explained  in  section  7.5  below)  uses  pools  of 
single-presentation  stimuli  distributed  about  the  desired  setting.  The 
"repeated  stimulus"  protocol  is  appropriate  for  investigating  the  resolving 
power  of  the  sensory  processes,  but  with  small  numbers  of  easily 
distinguished  speech  tokens,  subjects  can  find  themselves  responding  based 
on  memory  of  previous  presentations  rather  than  perception  of  the  current 
presentation,  so  multiple  single-use  tokens  are  employed  here.  Furthermore, 
this  study  uses  the  expected  phonetic  distribution,  which  varies  according  to 


92 


linguistic  criteria,  to  normalize  stimulus  design.  It  was  therefore  deemed 
both  more  consistent  and  ecologically  preferable  to  test  a stimulus 
distribution  (i.e.,  a pool)  shifted  to  the  appropriate  setting,  rather  than  a set  of 
stimuli  perfectly,  yet  unnaturally,  fixed  on  the  setting.4 

Third  and  last,  the  hypothetical  study  uses  a single  acoustic  dimension, 
while  this  study  splits  its  four  cue  dimensions  into  two  factors  each:  original 
contrast  level  and  level  of  contrast  reduction.  This  use  of  two  factors  is  a 
reasonable  way  to  simultaneously  study  and  exploit  the  notion  of  expected 
phonetic  distribution.  Note  that  the  factorial  grid  can  be  interpreted  such  that 
each  level  of  one  factor  is  instantiated  by  a (mini-)  distribution  comprised  of 
all  three  levels  of  the  other  factor.  For  instance,  high  original  contrast  is 
instantiated  by  the  tokens  at  2,  4,  and  6 standard  deviations  of  contrast 
reduction.  Likewise,  contrast  reduction  of  2 std.  dev.  is  instantiated  by  the 
tokens  at  High,  Med  and  Low  original  contrast.  By  observing  the  trends  as  the 
2-4-6  distribution  scrolls  across  the  three  levels  of  original  contrast,  we  can 
study  the  importance  of  the  original  distribution,  and  by  observing  the  trends 
as  H-M-L  scrolls  across  contrast  reduction,  we  can  do  the  same  with  the 
importance  of  the  editing.  In  this  way,  each  of  the  two  factors  serves  as  a 
slightly  different  view  of  the  acoustic  dimension  in  which  they  both  exist. 

Overall,  though  the  complexities  of  its  design  make  it  difficult  to 
recognize,  this  study  has  the  same  basic  structure  as  Macmillan  and 
Creelman's  example  illustrating  cumulative  d'.  As  applicable  to  this  study, 
that  structure  allows  for  the  calculation  of  a d'  distance  between  the  boundary 

4Note  also  that  by  statistical  theory,  the  addition  (or  subtraction)  of  two 
independent  statistics  entails  the  addition  of  the  two  variances,  and  if  the  two 
input  statistics  are  normally  distributed,  the  result  will  be  as  well.  We  might 
therefore  conceive  of  listeners'  expected  accuracy  for  a cue  as  a function  of  the 
combined  variances  of  at  least  three  processes:  sensory  acuity,  articulatory 
control,  and  idiolectal  differences  within  the  language  community. 


93 


tokens  and  the  tokens  representing  each  level  of  each  factor  in  the  factorial 
stimulus  array  for  each  cue. 

6.4  Stimulus  Construction 

Now  that  the  conceptual  framework  motivating  the  design  of  the 
perceptual  stimulus  arrangement  has  been  explained,  the  modification 
techniques  whereby  individual  tokens  were  fit  to  that  pattern  can  be 
discussed.  The  signal  editing  (cut  and  paste,  and  amplitude  attenuation)  were 
done  with  the  Digidesign  Audiomedia  software.  The  protocols  used  are 
described  below. 

Shortening  [t] 

The  appropriate  values  were  determined  from  the  data  in  Table  5.2  and 
multiplied  by  2,  4,  or  6,  according  to  the  factorial  level  of  contrast  weakening. 
The  resulting  durations  were  multiplied  by  the  digitization  rate  to  provide  a 
number  of  digitized  signal  points.  The  requisite  number  of  values  were  cut 
from  the  digitized  speech  signal  representing  the  [t]  closure  immediately 
preceding  the  burst.  In  no  case  did  the  deletion  stretch  far  enough  back  to 
affect  the  voicing  rolloff  subsequent  to  consonant  closure. 

Attenuating  ftl  Burst  Amplitude 

The  appropriate  values  were  determined  from  the  data  in  Table  5.2  and 
multiplied  by  2,  4,  or  6,  according  to  the  factorial  level  of  contrast  weakening. 
In  each  token,  the  burst  region  of  the  signal,  from  just  before  CFO  up  to  and 
excluding  VO,  was  isolated  and  copied.  The  copy  was  attenuated  by  the 
appropriate  amount,  using  the  Audiomedia  signal  processing  software.  Then 


94 


the  copy  was  substituted  for  the  appropriate  portion  of  the  signal  in  the 
original  token. 

Attenuating  Idl  Voicing  Amplitude 

The  appropriate  values  were  determined  from  the  data  in  Table  5.2  and 
multiplied  by  2,  4,  or  6,  according  to  the  factorial  level  of  contrast  weakening. 
In  each  token,  the  non-vocalic  voiced  closure  region  of  the  signal,  from  VVT 
up  to  and  excluding  CFT,^  was  isolated  and  copied.  The  copy  was  attenuated 
by  the  appropriate  amount  using  the  Audiomedia  signal  processing  software. 
Then  the  copy  was  substituted  for  the  appropriate  portion  of  the  signal  in  the 
original  token. 

Lengthening  [d] 

The  lengthening  of  [d]  is  a different  problem  from  the  other  cue 
modifications,  because  it  requires  inserting  a signal  with  some  content  into 
the  tokens,  rather  than  deleting  part  of  the  signal,  or  modifying  the  signal  in 
place.  This  requires  a decision  on  what  the  content  of  the  increased  duration 
of  the  signal  will  be. 

Upon  reflection,  it  was  decided  that  the  best  option  was  to  treat  periods 
of  the  fundamental  as  separate  units,  to  duplicate  them  in  place,  and  to 
progressively  favor  duplication  of  the  later  periods  of  voicing.  Periods  of  the 
fundamental  were  chosen  as  units,  rather  than  some  arbitrary  unit  of  length, 
because  they  are  the  most  coherent  acoustic  chunks  visible  in  the  natural 
speech  signal,  and  because  their  use  causes  the  least  disruption  in  the  acoustic 
structure  of  the  extant  partials  and  formants.  They  were  duplicated  in  place, 

5Recall  that  VVT  (Vocalic  Voice  Termination)  indicates  the  loss  of  stable 
formant  structure. 


95 


rather  than  as  a series  followed  by  its  twin,6  in  order  to  avoid  any  perception 
of  an  artificial  restart  or  loopback.7 8  Later  periods  were  duplicated  more  in 
order  to  favor  the  periods  with  the  lowest  amplitude.6 

Voicing  periods  were  determined  from  the  last  full  period  before  CFT 
and  backwards.  These  may  be  designated,  in  "alphabetic  countdown"  order: 
{N,  . . . , C,  B,  A}.  The  duration  of  each  period  was  measured.  The  appropriate 
values  were  determined  from  the  data  in  Table  5.2,  and  multiplied  by  2,  4,  or 
6,  according  to  the  factorial  level  of  contrast  weakening,  duration  lengthening 
in  this  case.  A HyperCard  program  was  written  to  help  determine  the 
number  and  identity  of  the  voicing  periods  to  add  to  each  token.  The 
program  took  two  inputs,  the  goal  duration,  and  the  durations  of  the  periods. 
The  program  then  calculated  the  added  duration  of  the  extra  periods, 
accumulated  in  the  order  specified  below  in  Table  6.2.  The  program 
accumulated  extra  periods  until  it  exceeded  the  goal  value,  then  the 
researcher  determined  whether  the  goal  duration  was  closer  to  the  extra 
duration  with  or  without  the  final  added  period. 

Table  6.2  is  to  be  read  and  applied  as  follows.  Consider  the  final  3 
periods,  CBA.  If  only  1 period  needed  to  be  added,  it  would  be  an  extra  A,  and 
the  resulting  sequence  would  be  CBAA.  If  2 periods  were  needed,  they  would 
be  a B and  an  A,  resulting  in  CBBAA.  If  3 were  needed,  they  would  be  a B and 
2 As,  resulting  in  CBBAAA.  Now  note  that  the  "3"  in  the  main  body  of  the 
table  is  located  at  the  cell  representing  2 extra  insertions  for  the  A period. 
Backtracking  numerically,  the  "2"  demands  1 extra  B.  The  ”1”  in  the  column 

6The  sequence  CBA  would  be  lengthened  as  CCBBAA,  not  CBACBA. 

7Some  readers  may  recall  the  cybernetic  stutter  of  the  lead  character  in  the 
1980s  television  show  Max  Headroom.  This  is  the  effect  avoided  by 
duplication  in  place. 

8The  sequence  CBA  would  be  lengthened  as  CBBAAA,  not  CCBBAA 


96 


Table  6.2 

Pattern  of  Period  Duplications  for  Lengthening  of  [d]. 


Addition 

Sequence 


Extra  Total 

Insertions  Occurrences 


21 

20  15 
19  14  10 

18  13  9 6 

17  12  8 5 3 

16  11  7 4 2 1 


6 

5 

4 

3 

2 

1 


7 

6 

5 

4 

3 

2 


Period  F E D C B A 
Designation 


for  A has  been  superseded  by  the  "3"  above  it,  and  therefor  requires  no  action. 
Thus,  the  body  of  the  table  shows  the  sequence  in  which  the  periods 
designated  along  the  bottom  are  added.  The  two  columns  on  the  right  show, 
for  each  designated  period,  how  many  extra,  and  how  many  total  periods 
should  occur  in  the  token  in  question. 

The  HyperCard  program,  with  the  input  of  the  period  durations, 
accumulated  the  durations  according  to  the  pattern  in  Table  6.2,  up  to  and 
beyond  the  goal  duration.  The  researcher,  with  the  output  from  the  program 
as  a guide,  used  the  Audiomedia  signal  editing  capabilities  to  duplicate  the 
various  periods  of  voicing  the  requisite  number  of  times,  thus  generating  the 
lengthened  [d]. 

There  are  two  important  problems  to  discuss  with  the  lengthened 
tokens:  lengthening  precision,  and  voicing  rolloff  rates.  Recall  that  for  the 
other  3 cue  modifications,  one  shortening  and  two  attenuations,  the  accuracy 


97 


of  the  modifications  is,  in  practical  terms,  perfect. ^ For  lengthening  on  the 
other  hand,  accuracy  can  only  be  guaranteed  to  within  one  half  the  duration 
of  the  glottal  period  which,  in  its  turn,  exceeds  the  goal  duration.  Half  a 
period,  assuming  a fundamental  frequency  of  200  Hz,  is  2.5  ms,  a small  but 
possibly  perceptible  duration,  given  the  [d]  duration  distribution  in  [dadat]: 
the  mean  is  65.522  ms  and  the  std.  dev.  is  10.761  ms.  The  problem  of  voicing 
rolloff  rates,  however,  ultimately  overshadows  that  of  lengthening  precision. 

The  problem  of  maintaining  voicing  in  the  face  of  stop  consonant 
closure  is  well  known  (Bell-Berti,  1975;  Lisker,  1977),  and  one  of  the  effects  of 
this  problem  is  the  decrease  of  voicing  amplitude  through  the  duration  of  the 
closure.  It  is  reasonable  to  expect  that  listeners  have  some  knowledge  of  the 
expected  configuration  of  that  rolloff.  The  modifications  used  to  lengthen  the 
[d]s  also  maintain  the  voicing  at  a higher  amplitude  than  the  normal  voicing 
rolloff  would  allow.  Thus,  while  the  lengthening  of  the  [d]  closure  is 
designed  to  weaken  the  voicing  perception  by  reducing  the  contrast  with  /t/, 
the  artificial  maintenance  of  voicing  amplitude  strengthens  the  voicing 
percept  by  increasing  the  contrast  with  /t/.  In  essence,  the  modification  works 
against  itself,  though  the  balance  of  levels  between  the  two  contradictory 
effects  cannot  be  foreseen. 

It  should  be  possible  in  future  experiments  to  investigate  the  patterns 
of  amplitude  rolloff,  and  to  apply  those  patterns  to  the  lengthened  closure 
voicing,  but  this  option  was  not  feasible  within  the  framework  of  the  present 
study.  Here  it  will  simply  be  necessary  to  note  that  the  perceptual  results  for 
lengthened  [d]  may  be  abnormal. 

9In  fact,  the  accuracy  of  shortening  is  limited  by  the  digitization  rate,  and  the 
accuracy  of  attenuations  is  limited  by  integer  format  used  in  internal  routines 
in  Audiomedia.  These  are  trivial  limitations,  as  they  are  well  below  any 
measurable  perceptual  effect. 


98 


6.5  Stimulus  Presentation 

The  exact  protocol  whereby  the  subjects  are  presented  with  the  stimuli 
can  either  cause  or  alleviate  a number  of  diverse  problems.  This  section 
discusses  the  measures  taken  to  alleviate  those  which  were  foreseen. 

Randomization  and  Order  Effects 

It  is  naturally  important  that  the  subjects  base  their  responses  on  their 
perception  of  the  stimuli,  rather  than  on  their  belief  in  some  pattern  in  the 
order  of  their  presentation.  The  obvious  and  common  solution  to  this 
problem  is  to  randomize  the  stimuli. 

Unfortunately,  this  is  not  always  enough.  Psychologists  and 
phoneticians  have  shown  the  existence  of  order  or  context  effects,  wherein 
preceding  context,  including  other  stimuli,  can  have  a measurable  effect  on 
the  responses  to  immediately  following  stimuli.  One  solution  to  this 
problem,  when  there  are  long  sequences  of  stimuli  to  present,  is  to  control 
their  presentation  by  computer,  and  have  a new  randomization  generated  for 
each  presentation.  Any  such  effects  would  thus  presumably  be  spread  equally 
through  the  response  set,  rather  than  consistently  falling  from  each  stimulus 
on  the  same  follower.  This  is  only  feasible,  however,  when  the  subject's 
responses  are  also  recorded  by  computer  and  can  be  re-ordered  by  the  same 
software  that  did  the  randomization. 

In  this  case,  the  materiel  was  not  available  for  subjects  to  respond  by 
computer,  so  an  alternative  had  to  be  found.  One  of  the  solutions  to  order 
effects  is  to  use  a single  presentation  order,  but  to  use  it  in  both  forward  and 
reverse  directions.  For  ease  of  response  tracking,  this  was  the  solution 


chosen. 


99 


The  presentation  order  used  was  a modified  true  randomization. 

Recall  that  the  reason  for  randomizing  is  to  keep  the  subjects  from 
responding  on  order  criteria  rather  than  perceptual  criteria.  A true 
randomization  allows  for  the  possibility  of,  for  instance,  all  the  /d/  stimuli 
first,  followed  by  all  the  / 1/  stimuli,  or  of  the  opposing  possibility  of  a strict 
alternation  between  /t/  and  /d/  stimuli.  Neither  of  these  unlikely 
possibilities,  though  legitimately  random,  would  be  perceived  as  such  by 
subjects.  Instead,  they  would  distract  the  subjects  from  the  perceptual  task  at 
hand. 

To  avoid  this  problem,  five  different  random  stimulus  orders  were 
generated.  Sequences  of  4 or  more  stimuli  originally  intended  as  the  same 
phoneme  were  tallied  (e.g.,  4 original  [d]s  in  a row,  modified  or  not).  The 
remaining  stimuli  seemed  well  distributed  in  sequences  of  1,  2,  and  3.  One  of 
the  randomizations  was  chosen  and  modified  so  that  it  contained  4 sequences 
of  four-in-a-row,  2 of  five-in-a-row,  1 of  six-in-a-row,  and  no  longer 
sequences.  This  quasi-random  order  was  judged  unlikely  to  distract  subjects 
from  the  perceptual  stimuli  as  such. 

Thus,  the  stimuli  were  presented  according  to  a single,  quasi-random 
order,  in  both  forward  and  reverse  directions. 

Stimulus  Presentation  and  Subject  Accommodation  Difficulties 

Despite  training  and  practice  before  the  test  (described  in  the  next 
section),  some  proportion  of  the  subjects  were  expected  to  become  confused  or 
have  difficulty  accommodating  to  the  task  at  the  beginning  of  the  test.  Two 
methods  were  used  to  alleviate  this  problem. 

First,  the  starting  point  in  the  presentation  order  was  cycled  through 
the  stimuli.  Specifically,  the  145  randomized  stimulus  tokens  were  divided 


100 


into  5 groups  of  29  each,  termed  {A,  B,  C,  D,  E},  then  approximately  equal 
numbers  of  subjects  were  tested  beginning  with  each  of  the  5 groups. 
Combined  with  the  forward-reverse  difference  in  direction  of  presentation 
order,  this  spread  the  risk  of  lost  data  from  subject  confusion  over  10  different 
regions  of  the  order. 

Second,  it  was  decided  to  include  at  the  end  of  each  test  a repetition  of 
the  initial  group.  That  way,  no  data  at  all  should  be  lost  to  the  subjects'  initial 
confusion  or  accommodation  processes.  Moreover,  by  comparing  the 
performance  on  the  initial  group  and  its  later  repetition,  we  could  test 
whether  subjects'  initial  responses  are  indeed  reliable. 

It  may  be  important  to  recognize  the  consequences  of  this  decision:  in 
every  administration  of  the  test,  one-fifth  of  the  tokens  are  presented  twice. 
While  the  responses  from  the  first  presentation  of  the  repeated  tokens  could 
be  simply  discarded  in  favor  of  the  second,  this  seems  to  be  an  overreaction  to 
the  danger  of  subject  difficulty  with  the  response  task.  The  initial  responses 
should  be  eliminated  if  proven  flawed,  but  response  data  is  too  valuable  to 
discard  without  direct  evidence  of  a problem.  Furthermore,  any 
contaminating  influence  of  the  first  presentation  on  the  second  seems  highly 
unlikely,  given  the  116  tokens  between  the  two  groups,  and  indeed,  the  144 
tokens  intervening  between  token  repetitions. 

To  recapitulate,  for  each  presentation  of  the  perceptual  test,  a subject 
would  listen  to  and  judge  not  just  the  145  unique  stimuli,  but  174  stimuli. 
These  constituted  six  fifths  of  the  stimuli,  arrayed  so  that  the  last  29  stimuli 
are  identical  to  the  first  29.  Along  with  the  5 starting  points  and  the  2 
directions,  this  fully  defines  the  10  presentation  sequences  used: 


101 


Forward: 

A B C D E A 
B C D E A B 
C D E A B C 
D E A B C D 
E A B C D E 

Backward: 

A E D C B A 
B A E D C B 
C B A E D C 
D C B A E D 
E D C B A E 


6.6  Material 

The  corpus  having  been  explained  above,  the  remaining  material  to  be 
described  consists  essentially  of  the  equipment  used  for  the  listening  tests, 
plus  the  answer  sheets  used  for  the  subjects'  responses. 

The  stimuli  were  stored  as  sound  files  on  an  Apple  Macintosh  Ilfx 
microcomputer.  Their  presentation  was  controlled  by  programs  written  in 
HyperTalk  and  run  on  HyperCard.  These  programs  required  the  selection  of 
one  of  the  ten  presentation  sequences  described  above.  External  function  calls 
then  played  preparatory  material  and  the  stimuli  through  an  Audiomedia  II 
sound  card.  The  sound  card  output  was  directed  through  a high  quality 
commercial  audio  amplifier  into  a pair  of  Sony  MDR-CD777  headphones  in  a 
sound  booth.  The  stimuli  were  played  at  a comfortable  listening  level,  with 
the  loudest  stimulus  peaking  at  approximately  70  dB  above  threshold  of 
audibility. 

The  answer  sheet  consisted  primarily  of  six  columns  of  responses  to 
circle,  with  29  pairs  of  responses  per  column,  for  a total  of  174  responses.  The 
responses  were  numbered,  with  a dashed  line  before  every  tenth  response  to 


102 


correspond  with  the  "sequencing"  beep  described  below  (section  6.7).  The 
answer  sheet  came  in  two  versions,  with  the  response  alternatives  ordered 
respectively  as  "D  T"  and  as  "T  D."  The  subjects  in  each  group  were  split 
equally  between  these  two  versions  to  guard  against  any  possible  prejudicial 
effect  of  the  ordering. 

The  reverse  side  of  the  answer  sheet  had  two  parts.  One  was  the 
twelve  response  alternatives  for  the  practice  test  based  on  /p  - b/,  described 
below.  The  other  was  space  for  information  about  the  subject’s  personal, 
educational,  and  linguistic  background,  to  be  filled  out  by  the  researcher 
during  a short  interview  after  the  perceptual  test.  This  information  was 
recorded  for  possible  post-hoc  analysis. 

Appendix  A shows  a copy  of  the  answer  sheet. 

6.7  Subjects'  Task  and  Preparation 

The  preparation  of  the  subjects  comprised  two  basic  aspects:  experience 
with  the  talker’s  voice,  and  training  and  practice  with  the  tasks  of  perceptual 
decision  and  response.  These  two  preparations  were  accomplished  in  tandem 
in  the  following  way. 

Upon  arriving  in  the  lab,  each  subject  was  taken  into  the  sound  booth, 
seated  at  the  table  where  the  test  was  taken,  and  given  a sheet  of  instructions. 
The  instruction  sheet  in  its  original  French  and  its  English  translation  are  to 
be  found  in  Appendix  A.  After  thanking  the  subjects,  it  asks  them  to  notify 
the  researcher  if  they  have  ever  had  any  hearing  problems.  It  then  explains 
with  written  examples  that  their  task  is  to  listen  to  the  word  medial 
consonant,  and  decide  whether  they  felt  it  was  a D or  a T.  It  shows,  with  an 
example,  how  to  mark  the  answers  on  the  answer  sheet,  and  explains  the  use 
of  the  beep  to  help  keep  the  numbering  right. 


103 


The  instruction  sheet  then  introduces  a practice  session,  using  12 
tokens  of  unmodified  pseudowords  with  bilabials  (/b,  p/)  instead  of  the 
apicals  (/d,  t/)  of  the  real  experiment.  The  practice  tokens  were  the  subjects’ 
first  introduction  to  the  talker's  voice.  The  answers  to  the  training  stimuli 
were  recorded  on  the  back  of  the  main  answer  sheet,  in  a column  of 
appropriate  "P  B"  alternatives. 

When  the  practice  test  had  been  completed,  the  experimenter  entered 
the  sound  booth  and  verified  that  the  answers  were  correct  and  correctly 
recorded.  If  they  were  not,  the  subject  was  verbally  re-instructed,  and  the 
practice  test  was  run  again.  Once  the  practice  test  was  successful,  the 
experimenter  asked  for  and  answered  any  further  questions,  then  left  the 
sound  booth  to  begin  the  full  perceptual  test. 

Each  repetition  of  the  perceptual  test  was  prefaced  by  a short  tale  read  by 
the  talker,  to  ensure  that  the  listeners  were  uniformly  accustomed  to  the 
talker's  voice.  The  tale  was  "La  Bise  et  le  soleil"  [The  North  Wind  and  the 
Sun],  and  was  adapted  from  The  Principles  of  the  International  Phonetic 
Association  (1949).  After  the  tale,  there  followed  a short  admonition  read  by 
the  talker  to  follow  the  instructions  as  previously  explained,  then  the  test 
began.  The  stimulus  tokens  were  played  by  the  HyperCard  program  in 
whichever  presentation  sequence  was  selected,  with  an  interstimulus 
interval  of  5 seconds.  Before  every  tenth  stimulus  token,  a beep  was  played, 
to  allow  the  subjects  to  verify  they  were  keeping  up  appropriately  or, 
hopefully,  to  allow  them  to  get  back  in  step.  After  the  174th  and  last  stimulus, 
three  beeps  were  played  and  the  experimenter  entered  the  sound  booth  to  ask 
the  background  questions  previously  mentioned. 

Whenever  possible,  the  experimenter  got  the  subjects  to  take  the  test 
more  than  once,  changing  the  presentation  sequence  through  the  following 


104 


cycle:  A-forward,  A-reverse,  B-forward,  . . . , E-reverse,  A-forward.  . . . Within 
each  subject,  the  order  of  the  response  alternatives  on  the  answer  sheet  ("D 
T or  T D ) remained  the  same.  This  information  on  retests  is  provided  for 
information  only,  since  the  retests  will  not  be  analyzed  here. 

The  "earphones  on"  portion  of  each  administration,  from  the  tale  for 
voice  acquaintance  through  the  last  token,  took  about  10  minutes.  Almost  all 
subjects  took  the  test  twice,  which,  along  with  the  interview  for  subject 
background  information,  took  a total  of  about  half  an  hour  of  the  subject's 
time.  Subjects  who  took  the  test  more  than  twice  took  correspondingly 
longer,  and  took  pauses  at  their  own  discretion.  Some  subjects,  especially 
those  associated  with  the  ICP  or  the  Universite  de  Grenoble,  retook  the  test  at 
their  convenience  on  different  days.  The  short  test  duration  and  the 
flexibility  of  its  administration  were  important  for  the  recruitment  of  the 
large  number  of  subjects  that  ultimately  participated. 

6.8  Subject  Groups 

All  willing  persons  with  no  self  reported  history  of  hearing  difficulties 
were  accepted  as  listeners.  They  were  divided  post-hoc  into  four  different 
subject  groups  based  on  the  information  gleaned  by  the  researcher  in  the 
interview  after  the  first  test.  Native  speakers  of  French  were  separated  from 
native  speakers  of  any  other  language,  and  subjects  with  no  special 
metalinguistic  experience  were  separated  from  those  with  vocational  or 
avocational  interests  in  language.  The  four  groups  were  termed  Naive 
Natives,  Naive  Foreigners,  Linguist  Natives,  and  Linguist  Foreigners,  and  the 
number  of  subjects  in  each  group  is  reported  in  Table  6.2. 

The  justification  of  constituting  groups  on  the  basis  of  these  two 
criteria,  native  language  and  metalinguistic  experience,  resides  in  the 


105 


Table  6.2 

Numbers  of  Subjects  per  Group 


Subject  Classes 

N (native)  F (foreign) 

N (naive)  43  23 

L (linguist)  21  19 

Grand  Total:  106 

possibility,  even  the  expectation,  of  important  differences  in  knowledge  of 
language  across  the  groups  so  defined.  The  groups  were  constituted  so  such 
differences  could  be  tested,  and  so  any  difference  could  help  demonstrate  the 
utility  of  the  perceptual  measurement  which  is  the  focus  of  this  study. 

"Interference"  is  the  term  commonly  used  by  linguists  to  describe  the 
influence  of  an  earlier,  typically  native  language  on  a subject's  performance 
and  competence  in  a language  learned  later  (Grosjean,  1982).  Phonetic  or 
phonological  interference  is  manifested  in  production  as  the  well-known 
phenomenon  of  foreign  accent,  but  evidence  also  exists  for  interference 
phenomena  in  perception  of  foreign  languages  (Flege,  1991).  Recall  that  cue 
settings  differ  across  languages  (section  2.4),  and  that  the  measurement 
technique  this  research  project  describes  is  intended  for  use  in  testing  whether 
cue  importance  differs  across  languages  as  well.  Under  these  conditions,  the 
possible  interference  from  the  listeners'  first  language  in  their  perceptual 
processing  of  French  as  their  second  clearly  warrants  the  separation  of  native 
and  non-native  speakers.  Any  questionable  cases  were  assigned  to  the  foreign 
categories,  so  that,  for  instance,  French-born  children  of  North  African 
immigrant  parents  were  typically  classified  as  foreigners  even  when  French 
rather  than  Arabic  was  clearly  their  dominant  language,  since  the  possibility 


106 


could  not  be  excluded  that  the  accented  French  in  the  immigrant  community 
had  somehow  influenced  their  acquisition  of  the  cue  structure  of  French. 

It  is  important  to  note  that  no  restrictions  were  placed  on  the  native 
language  of  the  foreign  groups,  and  therefore  a large  and  uncontrolled 
number  of  different  languages  and  language  families  are  represented.  As  a 
result,  any  characterization  of  the  behavior  of  the  foreign  group  as  a whole 
cannot  be  validly  taken  as  applying  to  any  particular  language,  nor  to 
language  in  general.  Any  differences  between  natives  and  foreigners  can  at 
best  be  taken  as  general  evidence  for  an  effect  of  language  experience  on  the 
process  of  phonetic  perception.  Specific  comparisons  of  languages  would 
require  more  stringent  protocols  controlling  the  subject's  language 
background. 

The  division  of  "linguists"  from  "naives"  was  a safety  measure.  Since 
the  testing  was  conducted  in  an  academic  environment,  with  all  comers 
accepted  as  subjects,  a large  number  of  the  subjects  were  the  linguists, 
phoneticians,  and  language  teachers  who  were  the  researcher's  colleagues  and 
acquaintances.  This  group,  trained  to  give  metalinguistic  attention  to 
linguistic  tasks,  might  well  exhibit  different  behavior  from  the  general 
population,  either  by  natural  ability  or  by  long  experience.  It  was  therefore 
decided  to  separate  those  subjects  with  linguistic  training  or  a professional 
interest  in  language  from  those  for  whom  language  was  simply  a tool  for 
communication  or  expression.  Any  questionable  cases,  such  as  acoustic 
engineers  working  temporarily  on  speech,  or  those  odd  persons  who  read 
linguistics  or  learned  languages  as  a hobby,  were  assigned  to  the  linguist 
categories,  in  order  to  ensure  the  common  level  of  metalinguistic  innocence 
of  the  naive  categories. 


CHAPTER  7 
EXPERIMENT  2: 

PROCESSING  AND  TABULATION  OF  PERCEPTUAL  DATA 

The  previous  chapter  presented  the  design  and  protocol  of  the 
perceptual  tests.  This  chapter  discusses  how  the  raw  data  from  those  tests 
were  processed  and  organized  into  usable  results. 

7.1  A Terminological  Note 

Certain  terms  are  commonly  used  from  this  point  onwards  to  refer  to 
different  classes  of  stimuli  and  responses,  and  deserve  a brief  explanation. 

Recall  (Chap.  3)  that  SDT  furnishes  certain  terms:  signal,  noise,  hit, 
miss,  false  alarm,  correct  rejection.  Signal  tokens  are  those  whose  cues  have 
been  modified  to  encourage  misperception  as  the  opposing  phoneme.  By 
contrast,  noise  tokens  are  those  which  have  not  been  modified.  Responses  to 
signal  tokens  are  either  hits  or  misses,  and  responses  to  noise  tokens  are 
either  false  alarms  or  correct  rejections. 

Due  to  the  redundancy  of  cues  in  the  speech  signal,  the  majority  of 
tokens,  regardless  of  any  modifications,  are  perceived  according  to  the 
speaker's  original  intention;  spoken  [t]  is  perceived  as  /t/,  and  spoken  [d]  as 
/ d/ . (Corresponding  responses  will  be  noted  by  the  capital  letters  employed 
on  the  response  sheets:  T and  D.)  These  typical  responses  can  be  referred  to  as 
base  responses.  The  opposing  cases  (spoken  [t]  perceived  as  /d/  and  spoken  [d] 
as  / 1/)  can  be  termed  target  responses,  since  they  are  the  responses  which  the 
token  modifications  are  intended  to  provoke,  or  they  can  be  referred  to 


107 


108 


simply  as  errors,  as  they  would  be  from  the  functional  viewpoint  of 
communication.  The  tallied  proportions  of  target/error  responses  will  be 
termed  error  rates,  in  keeping  with  common  statistical  practice. 

The  six  word  classes,  {/ ditit/ ,...,/ dudut/},  were  constructed  to  focus 
on  /t,  d/  in  /i,  a,  u/  environments,  so  those  classes  of  tokens  will  be  termed 


t,  d,  or  ti,  ta, ' 'tu,"  "di,"  "da,"  "du."  Modification  levels  will  be 
referred  to  as  A2,  A4,  and  A6,  according  to  the  number  of  standard  deviations 
of  modification,  and  the  unmodifieds  as  AO  or  "raw."  Original  contrast  levels 
will  be  referred  to  as  High,  Med,  and  Low.  These  two  factors  will  be  referred 
to  as  "Mods"  for  contrast  reduction  and  "Origin"  (or  sometimes  "Orgn")  for 


original  contrast.  Modification  types  will  be  "dur"  for  duration  and  "amp"  for 
amplitude.  The  four  listener  groups,  Naive  Natives,  Naive  Foreigners, 
Linguist  Natives,  and  Linguist  Foreigners,  will  be  referred  to  by  their  initials. 


NN,  NF,  LN,  and  LF,  and  the  pools  comprehending  both  naives  and  linguists 
will  be  termed  "AllNat"  and  "AllFor."  These  terms  are  used  in  combination, 


so  that  the  four  cues  and  their  associated  phenomena  are  typically  designated 
by  DAmp,  DDur,  TAmp,  and  TDur  (with  capitalization  added  to  aid  in  reader 
recognition).  We  can  also  refer,  for  example,  to  NN  TDur  A6  results. 


Individual  tokens  may  be  specified  by  their  unique  serial  number; 
Figure  5.1  demonstrates  label  locations  on  (premodification  versions  of)  da09 
and  ta07 . In  Appendix  B (the  only  place  where  individual  tokens  are 
specified),  the  code  reduces  amp  and  dur  to  the  single  letters  "a"  and  "d," 
substitutes  "r"  (short  for  "raw")  for  AO,  and  dispenses  with  the  A in  the 
modification  levels.  As  a result,  [didit]  number  18,  which  was  lengthened  by  2 
standard  deviations,  is  referred  to  as  dil8d2. 


109 


7.2  Coding  the  Response  Sheets 

Each  administration  of  the  test  to  each  subject  produced  one  response 
sheet  with  six  columns  of  circled  responses,  where  each  column  represents 
responses  to  1/5  of  the  stimuli.  (Recall  that  the  last  column  represents 
responses  to  a repetition  of  the  first  fifth  of  the  stimuli  presented.  See  the 
next  section  for  discussion.)  Since  approximately  equal  proportions  of  each 
subject  group  began  their  tests  with  each  of  the  5 fifths  of  the  presentation 
order,  we  can,  for  mnemonic  purposes,  refer  to  the  subjects,  the  presentation 
order,  the  fifths  of  the  data,  and  the  columns  of  responses  all  by  the  same 
term.  The  A subjects  heard  the  stimuli  in  the  A order,  and  their  response 
columns  represent  A,  B,  C,  D,  E,  A sets  of  data;  the  B subjects  heard  the  B 
order  and  their  responses  are  in  B,  C,  D,  E,  A,  B columns,  etc.  For  the  half  of 
the  subjects  who  heard  the  reverse  stimulus  order,  the  E'  subjects  heard  the  E’ 
order  and  their  responses  are  E',  D',  C',  B',  A',  E'  columns,  the  D'  subjects 
heard  the  D'  order  and  their  responses  are  D',  C',  B',  A',  E',  D'  columns,  etc. 

This  organization  of  the  data  into  fifths  greatly  simplified  the 
bookkeeping  aspects  of  scoring  the  response  sheets  and  recording  the  data. 
Only  two  keys  were  required:  a "forward"  key  with  A - E column  answers,  and 
a reverse"  key  with  E"  - A"  answers.  These  two  keys  were  made  on  overhead 
projector  transparency  material,  so  that  the  process  of  scoring  each  response 
sheet  consisted  of  the  following  process: 

Choose  the  key  for  the  correct  order. 

Align  the  appropriate  key  column  to  the  first  response  column, 

Scan  visually  through  the  key  to  the  subject's  responses, 

Mark  the  incorrect  responses  on  the  response  sheet,  and 

Shift  columns  and  repeat  until  done. 


110 


Thus,  the  plan  of  dividing  the  data  into  fifths,  distributing  the  subjects  across 
those  fifths  as  starting  points,  and  dividing  the  response  sheet  into 
corresponding  columns,  all  facilitated  the  scoring  and  bookkeeping  process. 

The  only  notable  problem  with  this  system  was  that  some  of  the 
subjects  lost  their  place  in  the  answer  sheets  when  moving  from  the  bottom 
of  one  column  to  the  top  of  the  next.  There  was  a tendency  to  overshoot, 
moving  not  to  the  following  column,  but  to  the  one  after.  Some  of  the 
subjects  managed  to  recover  from  their  mistake,  occasionally  by  use  of  the 
realignment  beep  and  the  corresponding  line  in  the  response  sheet.  More 
commonly  they  kept  on  in  the  wrong  column,  realized  their  mistake  when 
the  stimuli  continued  though  the  response  sheet  had  run  out,  and  recorded 
whatever  responses  remained  when  they  found  the  blank  column  they  had 
skipped.  On  the  whole,  the  realignment  beep  seems  not  to  have  helped 
realignment,  though  it  may  have  reassured  the  great  majority  who  stayed  in 
order  all  through  the  test. 

The  skipped  column  problem  always  caused  some  loss  of  data  due  to 
the  subject's  momentary  inability  to  record  a response.  Inattention, 
distraction,  and,  presumably,  unresolvable  indecision,  all  caused  occasional 
lost  responses  in  the  same  way.  The  skipped  column  problem,  however,  also 
caused  significant  loss  of  data  due  to  the  researcher's  inability  to  determine 
which  stimuli  the  responses  were  responding  to.  Once  the  strict 
correspondence  between  the  stimuli  and  responses  was  lost,  it  became 
necessary  to  identify  the  sequences  of  responses  before  being  able  to  score 
them.  Since  most  of  the  responses  were  correct,  the  sequences  were  typically 
quite  clear,  except  at  the  ends,  where  a few  responses  had  to  be  discarded  as 
uncertain.  However,  for  a few  subjects,  with  less  apparent  repair  strategies. 


Ill 


large  portions  of  the  data  had  to  be  sacrificed.  Ultimately,  only  that  data 
which  was  certain  was  used,  and  blanks  were  substituted  for  the  rest. 

When  the  response  sheets  had  been  scored,  all  the  responses  were 
entered  into  a Microsoft  Excel  spreadsheet  file.  There,  a few  other 
specifications  were  added,  including  the  subject's  name,  the  presentation 
order,  and  the  crucial  subject  group.  The  end  result  was  a file  containing  all 
the  valid  responses  to  the  various  stimuli,  appropriately  indexed  to  the 
subject  and  presentation  information. 

7.3  Validating  the  Initial  Responses 

The  initial  fifth  of  the  stimuli  were  repeated  at  the  end  of  the 
perception  test  in  order  to  guard  against  the  loss  of  data  in  case,  despite  the 
preliminary  training  with  the  talker's  voice  and  the  response  task,  the 
subjects  had  difficulties  accommodating  to  their  task  and  either  failed  to 
respond  or  performed  poorly  at  the  beginning  of  the  test.  If  the  subjects  failed 
to  respond,  the  repetitions  prevented  loss  of  the  data.  If  they  performed 
demonstrably  worse  on  the  initial  than  on  the  final  presentations,  the  initial 
set  could  even  be  discarded.  It  turned  out  that  almost  no  data  was  lost.  This 
section  tests  whether  the  initial  performance  was  adequate. 

A new  Excel  file  was  constructed  containing  only  the  repeated  data. 
Two  error  rates  were  calculated  for  each  subject,  ER'  for  the  initial 
presentation  and  ERf  for  the  final.  (Missing  data  were  not  counted  in  the 
calculations.)  In  the  crucial  situation,  if  a subject  performed  worse  on  the 
beginning  stimuli  than  on  the  end,  ER‘  would  be  greater  than  ERf;  otherwise, 
they  would  be  equal  or  ERf  would  be  greater.  The  difference  ERf  - ER1  was 
calculated  for  each  subject,  so  the  undesirable  cases  would  have  negative 


112 


values.  These  differences  were  used  in  statistical  calculations  to  test  the 
research  hypothesis: 

Ha:  p(ERf-ERi)  <0 

signifying  that  the  subjects  performed  worse  at  the  beginning  of  the 
perceptual  test,  against  the  null  hypothesis: 

HO:  p(ERf-  ER1)  > 0 

signifying  that  they  did  not  perform  worse  at  the  beginning. 

The  first  test  performed  was  a (one-tailed)  t-test  for  the  whole  subject 
population.  This  test  failed  to  support  Ha  at  a level  of  a = 0.05.  This  result 
indicated  that  the  initial  data  did  not  need  to  be  eliminated  outright,  but  it 
was  still  possible  that  the  four  subject  groups  performed  differently,  and  one 
or  more  of  the  subject  groups  began  the  test  with  a poor  performance. 
Accordingly,  an  analysis  of  variance  with  subject  group  as  the  factor  was 
performed,  along  with  separate  t-tests  for  the  data  from  each  group.  None  of 
the  tests  reached  the  significance  level  of  a = 0.05,  so  the  research  hypothesis 
was  rejected,  and  all  the  data  from  the  beginning  of  the  test  were  retained. 

7.4  Raw  Responses  and  Error  Rates 

With  the  analysis  in  the  preceding  section  done,  the  final  tallies  of 
responses  can  be  determined.  Since  the  raw  responses  are  quite  unrevealing 
until  organized  to  extract  the  appropriate  information,  they  are  reported  in 
Appendix  B.  There,  they  are  reported  token  by  token,  broken  down  into  the 
four  subject  groups  and  pooled  for  overall  tallies  per  token.  Analysis  of  the 
results  is  reported  in  the  next  chapter. 

The  error  rates  for  the  4 subject  groups  are  reported  in  Table  7.1  and 
shown  in  Figure  7.1.  The  table  also  reports  the  overall  error  rate  of  0.025752, 
or  approximately  2.6%.  For  the  purpose  of  these  figures,  all  functional  errors 


113 


(spoken  [d]  perceived  as  "T"  and  spoken  [t]  perceived  as  "D")  are  counted  as 
errors,  regardless  of  whether  or  not  the  tokens  were  edited  to  encourage 
misperception.1 

Table  7.1 

Numerical  Data  on  Overall  Error  Rates 


Subject  Classes 

N (native)  F (foreign) 

N (naive)  0.029479  0.031926 

L (linguist)  0.017534  0.018949 

Grand  Total:  0.025752 


Error  Rates  by  Group 

0.05  T 
0.04- - 

0.03  ■ ■ - ■ 


0.01  -- 

0 

Native  Foreign 

Figure  7.1.  A Graphic  View  of  Group  Error  Rates 

The  figure  and  table  make  it  appear  that  foreigners  performed  only 
slightly  worse  than  native  speakers,  and  that  linguists  performed  noticeably 
better  than  naives.  While  these  appearances  may  actually  be  the  case,  no 


Naive 

Linguist 


1See  section  4.1  for  explanation  of  the  difference  between  functional  errors 
and  SDT  errors,  i.e.,  misses  and  false  alarms. 


114 


calculations  were  done  to  test  the  statistical  significance  of  the  findings  for 
two  reasons.  First,  the  overall  performance  levels  of  native  vs.  nonnative 
and  linguist  vs.  nonlinguist  listeners  is  of  no  importance  to  this  research, 
which  is  designed  to  focus  on  different  levels  of  use  of  the  same  acoustic  cues 
within  and  across  subject  groups.  Indeed,  one  of  the  reasons  for  using  signal 
detection  theory  is  to  remove  overall  performance  from  consideration  in 
measuring  cue  importance.  Second,  the  talker  was  a linguist  and  was  well 
known  to  many  of  the  linguist  listeners,  so  the  superior  performance  by  the 
linguists  could  easily  be  due  to  simple  familiarity  with  the  talker's  voice  and 
speech,  rather  than  to  any  linguistic  training  or  sophistication  on  the  part  of 
the  listeners. 

The  main  conclusion  to  be  drawn  from  these  numbers  is  the 
extraordinarily  high  likelihood,  above  95%,  of  correct  phonemic  percepts  by 
all  4 groups  of  listeners,  even  in  the  absence  of  any  lexical  or  semantic 
information  whatsoever. 


7,5  The  Problem  of  Statistical  Analysis 


The  design  of  this  experiment  is  extremely  complex.  The  stimuli 
represent: 


2 phonemes 

3 vowel  contexts 

2 cues  per  phoneme 

2 factors  per  cue 

3 levels  per  factor 


(/t~d/), 

(/i,  a,  u/), 

(a  duration  cue  and  an  amplitude  cue), 

(contrast  reduction  and  original  contrast),  and 
(A2,  A4,  and  A6  std.  dev.  of  contrast  reduction  and 
high,  medium,  and  low  original  contrast). 


The  3x3  factorial  array  is  one  of  the  prime  focuses  of  this  study.  The  other 


prime  focus  employs  the  factorial  stimulus  array  to  reveal  differences  in  cue 
usage  patterns  across  subject  groups,  where  the  perceptual  test  subjects  are 


classified  according  to  a 2x2  array  based  on  native  language  and  metalinguistic 


115 


experience.  Each  subject  hears  and  responds  to  each  stimulus  once 
(approximately2),  in  a forced-choice  classification  as  one  of  the  two  phonemes, 
and  the  perceptual  "error"  rates  upon  which  the  study  is  based  are  extremely 
low. 

Consider  the  kinds  of  hypotheses  one  would  want  to  test  within  such  a 
design,  and  for  which  one  would  wish  a measure  of  statistical  validity: 

Cue  A has  an  effect  on  perception. 

The  effect  of  Cue  B changes  across  Factor  X. 

The  effects  of  Cue  C and  Cue  D are  different. 

The  patterns  of  cue  effect  are  different  for  Subject  Groups  M and  N. 

The  effects  of  Cue  E are  different  for  Subject  Groups  P and  R. 

General  Strategy  for  Statistics 

Among  the  tools  experimenters  are  trained  to  use  in  such  complex 
situations,  the  ANOVA  is  typically  the  first  applied,  to  test  whether  indeed 
the  various  factors  have  any  effect  to  begin  with.  For  those  factors  which 
show  an  effect,  post-hoc  tests  are  then  applied,  to  test  for  differences  in  levels 
of  the  factors,  or  to  test  particular  interactions  of  interest  to  the  researcher. 
Such  tests  include  t and  z tests,  as  well  as  more  esoteric  procedures  such  as 
Duncan's  multiple  range  test.  Such  a strategy  would  allow  investigation  of 
all  the  hypothesis  types  above. 

Unfortunately,  the  ANOVA  and  typical  accompaniments  cannot  be 
validly  used  here,  because  they  crucially  require  parametric  data  for  their 
application.  In  other  words,  each  data  point  in  the  sample  must  be  a 
numerical  measurement.  This  experiment  is  based  instead  on  relationships 
between  binomial  proportions  (a.k.a.,  rates)  where  each  data  point  in  the 
sample  is  a choice  between  two  categorical  alternatives  (classically  "yes  or  no," 

2Recall  from  section  6.5  that  each  of  the  subjects  responds  twice  to  one  of  the 
fifths  of  the  data. 


116 


or  success  or  failure,  but  here  /t~d/).  This  excludes  the  possibility  of  the 
normal  use  of  ANOVA  and  the  many  other  well  known  parametric  statistical 
tests. 

The  treatment  of  proportions  in  college-level  textbooks3  is  typically 
confined  to  the  estimation  and  test  of  a proportion  and  of  a difference 
between  two  proportions.  It  would  theoretically  be  possible  to  do  pairwise 
comparisons  of  all  the  appropriate  proportions  in  this  experiment,  but  the 
density  and  complexity  of  that  strategy  virtually  guarantees  difficulties  in 
interpretation,  plus  the  number  of  tests  required  would  give  a high 
likelihood  of  undetected  type  I or  type  II  error  in  the  results.  This  possible 
strategy  was  not  adopted. 

An  important  share  of  the  literature  on  proportions  deals  with  analysis 
of  contingency  tables.  These  are  experiments  where  each  item  in  the  sample 
is  cross-classified  according  to  two  or  more  categories  on  each  of  two  or  more 
factors.  The  classification  of  subjects  in  this  study  might  have  been 
appropriate  for  such  a study,  since  each  subject  was  either  a native  or  a foreign 
speaker  of  French  and,  at  the  same  time,  either  a naive  or  a linguist.  Of  the 
four  cells  defined  by  the  2x2  table,  each  subject  fits  exactly  one.  Unfortunately, 
the  proportions  of  subjects  so  classified  is  of  no  research  interest  in  this  study, 
and  the  3x3  factorial  array  of  contrast  reduction  and  original  contrast  is  not  a 
contingency  table.  Rather  than  hearing  the  stimuli  and  classifying  it  in  one 
cell  of  the  array,  the  subjects  hear  a stimulus  from  each  cell,  and  classify  it 
according  to  a third  "factor":  phoneme  identity  as  /t/  or  /d/.  So  this  strategy 
was  never  a possibility. 


3This  refers  to  texts  aimed  at  nonstatisticians  in  service  courses  preparing 
them  for  experimental  or  survey  research  in  their  own  fields. 


117 


Ultimately,  two  promising  possibilities  were  found.  The  first  is  the 
analysis  of  "gradient  in  proportions."  Fleiss  (1980)  is  essentially  a thorough 
handbook  on  the  possible  applications  of  contingency  table  analysis,  and  thus 
fruitless  for  the  study  at  hand.  However,  his  section  9.2,  "Gradient  in 
Proportions:  Samples  Quantitatively  Ordered,"  presents  a strategy  that  could 
be  extended  to  this  study.  A linear  model  is  used  to  furnish  predictions  of  the 
proportions,  the  differences  between  the  expected  and  observed  proportions 
are  calculated,  and  then  chi-square  statistics  are  used  to  test  whether  the  trend 
is  linear  and  whether  the  slope  differs  from  0.  This  method  has  clear 
affinities  with  the  linear  regression  methods  of  parametric  statistics.  A 
caveat:  Fleiss  only  vouches  for  its  validity  when  the  proportions  are  not  close 
to  0 or  1,  and  those  in  this  study  are  very  close  to  0. 

The  second  promising  possibility  is  the  use  of  logistic  or  log-linear 
models.  These  are  mentioned  by  Fleiss,  and  were  recommended  in 
consultations  with  researchers  and  a statistician.  According  to  Fienberg 
(1977),  these  models  are  based  on  the  use  of  log  transforms  of  proportions 
(and  other  types  of  data),  which  then  allow  the  transformed  data  to  be  treated 
with  ANOVA-like  procedures.  Another  caveat:  Fienberg's  exposition 
concentrates  on  applying  the  new  methods  to  the  old  problem  of  contingency 
tables,  and  as  stated  above  the  data  in  this  study  are  not  so  constituted. 

Thus,  both  the  gradient  approach  and,  especially,  the  logistic  family  of 
approaches  are  worthy  of  further  attention.  However,  an  adequate 
investigation  of  their  validity  and  application  to  this  study  would  in  itself  be  a 
daunting  undertaking.  A tactical  decision  was  made  to  set  that  task  aside  for 
future  action  and  to  apply  in  the  interim  a procedure  which  is  less  ambitious, 
but  more  easily  implemented:  confidence  intervals. 


118 


Confidence  Intervals 

A sample  statistic  is  a point  estimate  of  the  true  parameter  that  exists  in 
the  population  from  which  the  sample  is  drawn.  A confidence  interval  (or 
Cl)  is  a range  constructed  around  the  sample  statistic,  within  which  one  can 
have  a specific  level  of  confidence  that  the  true  population  parameter  is  to  be 
found.  They  are  mathematically  related  to  hypothesis  tests  (z  and  t tests),  and 
they  are  sometimes  used  as  substitutes  for  such  tests,  with  the  researcher 
rejecting  at  the  a level  (.05  for  a 95%  Cl)  all  hypotheses  assigning  the  statistic 
any  value  outside  the  Cl.4  CIs  are  fairly  easy  to  calculate,  to  represent 
graphically,  and  to  interpret.  And  CIs  can  be  constructed  both  on  proportions 
and  on  d’  values. 

There  are  limits,  however,  to  the  validity  of  constructing  CIs  on 
proportions:  the  samples  must  be  large  enough,  and  the  proportions  must  not 
be  "too  close"  to  0 or  1.  These  two  concepts  are  related,  in  that  the  larger  the 
sample,  the  closer  the  proportion  can  be  to  0 or  1 and  still  permit  a Cl.  This 
relationship  has  given  rise  to  criteria  that  must  be  satisfied  before  a Cl  can  be 
calculated,  but  the  literature  provides  several  different  versions  of  the 
criterion.  The  most  liberal  criterion  found  was  that  of  Mendenhall  and 
Beaver  (1991),  which  requires  that  the  range  of  2 std.  dev.  on  either  side  of  the 
proportion  be  contained  within  the  interval  0 to  1. 

This  criterion  was  applied  to  all  the  cells  of  every  analysis  in  this  study, 
and  the  results  are  included  in  the  data  tables  in  Appendix  C.  The  notation 
"NV,"  signifying  "not  valid,"  is  recorded  for  every  case  where  the  data  in  the 
cell  failed  to  meet  the  criterion. 


4There  are  risks  associated  with  CIs  that  make  this  procedure  worth  avoiding 
if  possible.  See  Fleiss  (1980,  pp.  xii  & 29-31). 


119 


Study  of  the  tables  shows  that  the  NV  is  ubiquitous  within  the 
fundamental  3x3  matrices,  even  in  those  of  the  most  populous  subject  group, 
NN  (naive  natives).  There  is  enough  data  in  the  pooled  cells  (across  the  two 
factors,  and  for  the  boundary  tokens)  that  the  notation  is  rare  there  for  the 
NN  and  NF  (naive  foreign)  groups.  For  the  Group  Pools  analysis  ("All 
Native  and  All  Foreign,  which  discards  the  distinction  between  naives 
and  linguists),  NV  occurs  only  in  two  of  the  four  "Bounds"  pools  in  "All 
Foreign."  Since  the  "Vowel"  analysis  pools  so  massively,  the  NV  is 
uncommon  in  any  but  the  "Bounds"  pools. 

Thus,  a large  portion  of  the  desired  CIs  cannot  be  validly  constructed 
due  to  the  extremely  low  error  rates.  Without  those  CIs,  it  would  be 
impossible  to  perform  an  important  number  of  the  desired  error  rate 
comparisons  between  groups  and  between  cues.  Therefore,  the  strategy  of 
constructing  CIs  on  the  response  proportions  was  abandoned  in  favor  of 
constructing  them  on  the  d'  values. 

The  low  error  rates  cause  difficulties  for  the  d'  values  and  their  CIs  as 
well.  First,  the  calculation  of  d'  itself  is  rendered  impossible  in  the  case  of 
cells  with  "perfect"  performance,  i.e.,  proportions  of  0 or  1.  Researchers  apply 
a correction  in  such  cases,  and  the  most  common  correction  method^  has  the 
effect  of  transferring  half  a response  from  the  1 side  to  the  0 side.  For 
example,  10  yesses  and  0 noes  would  be  treated  as  9.5  yesses  and  .5  noes.  This 
correction  method  is  used,  only  in  pool  cells,  seven  times  in  the  Group 
analysis  and  once  in  the  Group  Pool  analysis.  These  cases  are  indicated  by  a 
0.5  in  the  row  containing  the  number  of  target  (error)  responses  for  each  cell 
so  treated  in  the  Appendix  C data  tables.  These  are  the  only  cases  where 
fractional  numbers  occur  in  what  would  have  been  integer  counts. 


5Other  methods  exist.  See  Macmillan  and  Creelman,  1991,  p.  10. 


120 


The  second  difficulty  with  these  low  error  rates  and  d directly  concerns 
the  CIs.  It  turns  out  that  the  variance  associated  with  z-transforms  of 
proportions  increases  dramatically  for  proportions  near  0 and  1 (Macmillan  & 
Creelman,  1991,  p.  272).  Since  d is  a difference  between  two  such  z-scores, 
and  since  variances  sum  when  distributions  are  subtracted  (or  added),  the 
variance  on  d will  be  correspondingly  larger.  That  very  large  variance  is  one 
of  the  factors  in  the  construction  of  CIs  around  the  d'  values,  and  should 
result  in  very  large  CIs.  The  larger  the  CIs,  the  more  the  CIs  will  overlap,  and 
the  fewer  will  be  the  cases  where  the  CIs  constitute  evidence  for  a real 
difference  between  the  conditions  tested.  This  is  a real  disadvantage  as  to  the 
number  and  diversity  of  the  conclusions  that  d1  confidence  intervals  will 
allow  to  be  drawn  from  this  rich  and  complex  design,  but  there  is  a 
complementary  advantage:  those  effects  which  are  supported  by  Cl  differences 
are  very  likely  to  be  real. 

To  recapitulate,  further  research  into  statistical  methods  is  needed  to 
determine  the  statistical  procedures  most  capable  of  exploiting  the  data  in  this 
rich  and  complex  experiment  design.  These  procedures  would  be  necessary 
before  making  claims  about  the  true  sensitivities  to  cues  in  any  language  or 
environment.  In  the  interim,  the  best  available  method  appears  to  be  the 
construction  of  confidence  intervals  around  the  d'  values.  This  method 
should  be  adequate  for  the  present  project,  where  the  goal  is  primarily  to 
explore  the  promise  of  this  overall  research  strategy  for  measuring  cue 
importance. 


7.6  Results 


The  results  exist  in  two  formats,  numerical  and  graphical.  The 
numerical  results  are  reported  in  the  tables  in  Appendix  C.  They  include  the 


121 


numbers  of  responses  in  each  cell,  the  response  rates  calculated  therefrom, 
the  pooled  rates'  z-scores  and  their  variances,  and  finally,  the  d'  values,  their 
variances,  and  the  ranges  of  their  confidence  intervals.  The  introduction  to 
the  appendix  serves  as  a guide  to  excavating  data  from  the  tables  and  also 
presents  details  pertaining  to  calculation  methods. 

This  section  presents  the  graphs  of  the  major  results:  the  d'  values  and 
their  confidence  intervals  in  an  interpretable  format  and  in  appropriate 
comparison  groups.  The  figures  are  all  plotted  in  the  same  fashion,  with  the 
three  levels  of  one  of  the  two  stimulus  factors  as  categorical  variables  on  the 
x-axis,  and  with  the  d'  values  on  the  y-axis.  Lines  connect  the  series  of  data 
points  across  the  three  levels  of  the  stimulus  factors,  and  error  bars  indicate 
the  extent  of  a 95%  confidence  interval  constructed  on  the  d'  data  point.  The 
series  are  slightly  offset  from  each  other  so  that  the  error  bars  will  not  lie  on 
top  of  each  other,  and  the  confidence  intervals  will  remain  interpretable. 

The  first  half  of  the  graphs  are  presented  here  and  concern  the  contrast 
reduction  stimulus  factor.  The  second  half  concern  the  other  factor,  original 
contrast  level,  and  are  presented  in  Appendix  D.  Within  each  half,  four  sets 
of  graphs  are  presented:  Group  (Fig.  7.2),  Group  Pool  (Fig.  7.3),  Cue  (Fig.  7.4), 
and  Vowel  (Fig.  7.5).  The  Group  set  comprises  the  4 graphs  for  the  4 subject 
groups  (NN,  NF,  LN,  and  LF)  and  each  graph  plots  that  group's  use  of  the  4 
cues  tested.  The  Group  Pool  set  is  similar  to  the  Group  set,  but  eliminates  the 
distinction  between  naives  and  linguists,  so  it  comprises  2 graphs.  All  Natives 
and  All  Foreign.  The  Cues  set  contains  the  same  data  as  the  Group  set,  but  it 
examines  subject  groups  within  cue  rather  than  cues  within  subject  group,  so 
it  comprises  4 graphs  for  the  4 cues  (DAmp,  DDur,  TAmp,  and  TDur).  The 
Vowel  set  is  included  for  the  sake  of  thoroughness,  and  is  comprised  of  4 
graphs,  /i,  a,  u/  and  All  Vs,  with  each  graph  plotting  the  4 cues  for  that  vowel 


122 


condition.  Note  that  since  the  Vowel  set  pools  across  all  4 subject  groups,  it 
encompasses,  and  presumably  camouflages,  important  differences  in  listening 
strategy  across  subject  groups. 

Discussion  and  analysis  of  these  results  is  reserved  for  the  next  chapter. 


d'  Values 


123 


DAmp  DDur  — a—  TAmp  — TDur 


Contrast  Reduction 


Figure  7.2.  Contrast  Reduction  Results  by  Group 


124 


Contrast  Reduction 


Figure  7.3.  Contrast  Reduction  Results  by  Group  Pool 


d'  Values 


125 


Contrast  Reduction 


Figure  7.4.  Contrast  Reduction  Results  by  Cue 


d'  Values 


126 


DAmp  DDur  TAmp  TDur 


Contrast  Reduction 


Figure  7.5.  Contrast  Reduction  Results  by  Vowel 


CHAPTER  8 

ANALYSIS  AND  COMMENTARY 


The  preceding  chapter  concluded  with  the  presentation  of  the  results  of 
the  perceptual  tests,  in  the  graphic  form  of  line  charts.  (The  numeric  data  are 
reported  in  Appendix  C.)  This  chapter  concerns  the  examination  and  analysis 
of  those  charts  and  the  cue  sensitivity  functions  they  report. 

8.1  General  Aspects 

Cursory  perusal  of  the  charts  as  a group  reveals  several  general 
characteristics  worth  noting. 

The  first  and  perhaps  most  important  general  finding  is  that  the  slopes 
of  the  functions  across  the  three  levels  of  the  Contrast  Reduction  factor  (Figs. 
7.2  - 7.5)  move  in  the  right  direction.  This  shows  that  the  listeners' 
sensitivity  increased  as  the  cues'  measurements  moved  away  from  their 
natural  distributions  toward  the  characteristics  of  the  opposing  phoneme. 
Thus,  the  most  rudimentary  expectation  for  this  study  was  met.  The  charts 
for  the  Original  Contrast  factor  (Figs.  D.l  - D.4,  in  App.  D)  are  clearly  different, 
seemingly  showing  every  possible  configuration  except  a continual  decrease. 

It  seems  the  Origin  factor  performed  its  intended  function  of  providing  a 
controlled  base  from  which  to  launch  the  modifications,  but  the  resulting 
percepts  are  dictated  by  the  modification  rather  than  premodification 
configuration.  Accordingly,  the  remaining  discussion  will  concern  Contrast 
Reduction  rather  than  Origin. 


127 


128 


Second,  the  functions  seem  to  start  as  often  as  not  in  the  slightly 
negative  values  of  d'  for  the  first  level  of  the  stimulus  factor.  A negative  d" 
here  indicates  that  the  unedited  boundary  tokens  were  more  often 
misperceived  as  the  opposing  phoneme  than  were  certain  of  the  stimuli 
which  had  been  edited  specifically  to  encourage  that  misperception.  This  is 
an  undesirable  situation,  since  it  suggests  a kind  of  "calibration"  error  in  the 
location  of  the  d functions.  The  negative  d'  values,  and  perhaps  the  entire 
associated  function,  may  be  underestimated. 

One  can  reasonably  speculate  that  this  error  is  caused  by  using 
boundary  tokens  to  supply  the  false  alarm  rates  used  to  calculate  all  the  d" 
values.  Recall  that  the  boundary  tokens  are  the  two  tokens  with  the  lowest 
original  contrast,  and  presumably  the  highest  natural  likelihood  of 
misperception.  They  represent  the  weakest  tail  of  what  is  theoretically, 
assuming  normalcy  of  cue  measurements,  a distribution  that  is  six  standard 
deviations  wide.  If  a high  original  contrast  token,  by  definition  from  the 
strongest  region  of  that  distribution,  is  weakened  through  editing  by  only  2 
standard  deviations,  that  token  may  not  even  end  up  on  the  weak  side  of  the 
mean,  and  it  could  easily  be  less  frequently  misperceived  than  the  raw 
boundary  tokens.  A possible  remedy  will  be  discussed  below  (section  8.7). 

Third,  the  confidence  intervals  are  behaving  as  expected,  within  certain 
limits.  Specifically,  the  confidence  intervals  which  encompass  larger 
numbers  of  responses  are  generally  smaller  than  those  which  are  based  on 
less  data.  For  instance,  in  the  Group  analyses  (Fig.  7.2),  there  are  about  50 
responses  per  stimulus  for  the  NN  group,  28  for  NF,  26  for  LN,  and  22  for  LF. 
In  the  graphs,  the  CIs  are  smaller  for  NN  than  for  the  other  3 groups,  and  the 
Group  Pool  CIs  (Fig.  7.3),  encompassing  data  from  both  NN  and  LN,  are 
smaller  still.  However,  the  number  of  responses  is  clearly  not  the  only 


129 


important  determinant,  since  in  the  NF,  LF,  and  LN  graphs  there  are 
important  discrepancies  in  Cl  length  across  cues.  This  will  be  discussed 
further  in  section  8.3 

Fourth  and  last,  it  seems  that  even  at  the  highest  level  of  the  two 
stimulus  factors,  there  are  few  data  points  sufficiently  distant  from  one 
another  that  their  CIs  fail  to  overlap.  Maintaining  a conservative  attitude 
about  the  conclusions  justified  by  the  CIs,  we  cannot  accept  that  there  is  a 
significant  difference  in  the  cues'  usage  where  the  CIs  overlap.  Those  that  can 
be  accepted  will  be  discussed  in  section  8.6,  as  will  trends  that  fail  to  reach  the 
level  of  significance.  On  the  positive  side,  a fair  number  of  CIs  at  the  higher 
factor  levels  do  exclude  the  d'  value  of  0.  This  constitutes  evidence  for  a 
significant  effect  of  the  stimulus  factor  at  that  level,  and  reinforces  the  first 
point  above,  that  the  listeners  respond  as  expected  to  the  cue  manipulations 
performed.  The  number  of  significant  effects  would  doubtless  increase 
should  the  second  point  above,  the  calibration  problem,  be  solved. 

8.2  The  Duration  of /d/ 

One  other  general  phenomenon  must  be  considered  before  moving  on 
to  specific  analyses.  In  examining  the  information  that  can  be  gleaned  from 
individual  graphs,  it  must  first  be  remembered  that  the  DDur  cue  is  a special 
case.  Recall  (section  6.4)  that  the  cue  manipulation  for  DDur  had  the  closure 
lengthened  by  repeating  periods  of  closure  voicing.  The  lengthening  reduced 
the  duration  contrast  with  the  longer  /t/,  but  at  the  expense  of  artificially 
maintaining  the  closure  voicing  amplitude.  Thus,  the  DDur  manipulation  is 
paradoxical,  weakening  one  contrast  while  strengthening  another.  The  other 
cue  manipulations  (DAmp,  TAmp,  and  TDur)  do  not  suffer  from  such 
paradoxes. 


130 


Consider  the  Cue  graphs  (Fig.  7. 4)  with  this  difference  in  mind.  The 
overall  slopes  of  the  functions  for  the  other  three  cues  are  similar.  The 
overall  slopes  of  DDur  are  shallower,  and  the  graph  stands  out  as  containing 
the  only  instance  (within  the  results  for  the  contrast  reduction  factor)  of  a 
descending  function,  where  a higher  level  of  contrast  reduction  results  in  a 
lower  d'  value.  In  fact,  that  descending  function  represents  the  NN  subject 
group,  both  the  largest  and  presumably  the  most  linguistically  homogeneous 
of  the  four  subject  groups.  These  observations  rather  pointedly  suggest  that 
the  listeners  did  indeed  react  with  ambiguity  to  the  mixed  message  of  the 
DDur  stimuli.  Since  the  DDur  function  occurs  in  the  other  analyses,  and  in 
particular  the  functions  in  the  Cue  set  are  simply  a cross-classification  of  the 
same  functions  in  the  Group  set  of  graphs,  it  is  important  to  examine  the 
other  charts  with  this  special  status  of  DDur  in  mind. 

8.3  A Detailed  Analysis  and  A Design  Weakness 

As  suggested  in  the  preceding  section,  the  NN  (naive  natives)  subject 
group  is  central  to  this  experiment.  NN  has  the  most  subjects  and  data,  the 
subjects  are  presumably  the  most  homogenous,  and  their  metalinguistically 
innocent  monolingualism  is  representative  of  the  great  majority  of  the 
French  speaking  public.  In  fact,  as  this  experimental  design  is  intended  to 
apply  generally,  the  NN  group  represents  by  extension  the  native  speakers  of 
any  human  language.  Thus,  the  most  important  analysis  in  this  experiment 
is  that  of  the  NN  group's  use  of  cues  and  the  cue  use  differences  between  that 
group  and  the  others. 

However,  before  proceeding  with  that  analysis  we  must  understand 
certain  limitations  in  the  design  of  the  experiment,  and  certain  types  of 
weakness  in  the  data  it  produced.  These  can  best  be  illustrated  by  examining 


131 


certain  graphs  from  other  subject  groups,  and  the  numerical  data  behind 
them,  which  better  illustrate  the  phenomena  in  question. 

Consider  the  NF  graph  (in  Fig.  7.2).  The  DDur  function,  as  noted  in  the 
previous  section,  has  a shallower  slope  than  the  other  functions.  It  also 
stands  out  strikingly  by  its  location  straddling  d'  = 1 and  by  the  size  of  its  CIs, 
which  are  much  larger  than  those  of  TDur  and  especially  DAmp.  The 
numerical  results  in  Table  8.1  (extracted  from  the  appropriate  table  in 
Appendix  C)  show  the  immediate  source  of  the  values  plotted  for  the  DDur 
function. 

We  can  see  that  the  perfect  performance  on  the  AO  boundary  tokens  (or 
0.5  errors  with  the  correction  applied)  may  be  the  source  of  the  problems  with 
DDur.  The  extremely  good  performance  on  the  boundary  tokens  combines 
with  a relatively  poor  performance  on  the  three  reduced  contrast  stimulus 
sets  to  displace  the  d'  values  calculated  into  the  high  range  they  occupy.  The 
high  variance  caused  by  the  nearness  of  the  (corrected)  proportion  to  0 inflates 
the  Cl  to  its  extreme  size.  Though  the  special  nature  of  DDur  was  already 
noted,  we  can  still  ask  whether  high  sensitivity  to  DDur  is  a true  characteristic 
of  people,  or  perhaps  just  foreigners,  listening  to  French.  However,  looking 
at  the  DDur  cue  graph  (in  Fig.  7.4)  we  can  see  that  the  NF  DDur  function  is 
even  more  uncharacteristic  of  that  cue  across  groups  than  it  is  across  cues 
within  the  naive  foreign  listeners,  which  reinforces  our  suspicion  of  the  NF 
DDur  result. 

At  this  point,  examining  the  method  whereby  these  figures  are  arrived 
at  can  reveal  the  source  of  the  DDur  function's  oddity.  Given  that  there  were 
about  28  NF  responses  per  token,  the  numbers  of  total  responses  imply  an 
important  amount  of  pooling  to  produce  at  the  totals  used  in  the  d' 
calculations.  The  three  levels  of  contrast  reduction  reported  in  Table  8.1  each 


132 


Table  8.1 

Numerical  Results  for  Mods  NF  DDur 


AO  (raw) 

A2 

A4 

A6 

Error  (Target)  responses 

0 (*  0.5) 

7 

9 

12 

Total  responses 

163 

247 

244 

246 

Response  Proportion 

0.0031 

0.0283 

0.0369 

0.0488 

z-score 

-2.739 

-1.906 

-1.788 

-1.657 

var(z) 

0.2155 

0.0265 

0.0224 

0.0184 

pool  across  the  three  levels  of  the  other  factor,  original  contrast.  Each  cell  in 
that  3x3  array  also  pools  across  a token  each  for  the  three  vowel 
environments,  /i,  a,  u/.  The  responses  for  the  three  levels  therefore  each 
comprehend  responses  to  9 different  individual  tokens.  A pair  of  boundary 
tokens  was  set  aside  for  each  cue  in  each  vowel  environment,  so  the  AO 
condition  comprehends  responses  to  6 different  tokens.  With  the  exception 
of  the  analysis  by  vowel,  this  pattern  of  pooling,  6 tokens  per  AO  and  9 per 
factorial  level,  is  the  same  for  every  d’  and  Cl  calculated  in  this  study.  Any 
change  across  analyses  in  the  number  of  responses  is  due  to  differences  in  the 
number  of  subjects  in  the  group  or  pool. 

What  can  we  conclude  from  this?  We  can  reasonably  suspect  that  the 
oddity  of  the  NF  DDur  sensitivity  function  is  ultimately  no  more  than  a rare 
effect  of  random  sampling.  In  other  words,  this  particular  time,  when  this 
particular  small  number  of  subjects  listened  to  this  particular  small  set  of 
tokens,  they  happened  to  perform  perfectly  on  the  boundary  tokens  and 
poorly  on  the  factorial  tokens.  If  they  took  it  again,  or  heard  a different 
analogous  set  of  stimuli,  or  if  a different  set  of  subjects  took  the  test,  the 
results  would  probably  regress  toward  the  mean,  showing  worse  performance 


133 


on  the  boundary  tokens  and  better  on  the  factorials.  The  DDur  function 
would  then  be  in  a lower  range  with  smaller  CIs. 

Ultimately,  this  problem  indicates  a weakness  resulting  from  the  lack 
of  any  previous  experience  with  this  type  of  design.  In  the  face  of  low 
expected  error  rates,  an  experiment  of  this  sort  must  compensate  with  either  a 
large  number  of  subjects  (i.e.,  responses),  or  with  a large  variety  of  stimuli  per 
class,  and  preferably  both.  If  the  subjects  or  stimuli  are  too  few,  the  risk  rises 
that  there  will  be  too  little  data  to  counter  the  ceiling  effect.  This  explanation 
is  supported  by  the  fact  that  in  the  8 cases  where  the  "half-a-response" 
correction  was  applied  to  avoid  a 0 response  rate,  no  case  concerned  NN,  the 
largest  subject  group,  and  5 of  the  8 cases  concerned  the  AO  boundary  tokens, 
which  pooled  across  6 rather  than  9 stimuli.  Possible  remedies  for  this 
weakness  are  discussed  in  section  8.7. 

In  the  interim,  the  following  are  the  subject  groups,  cues,  and  stimulus 
pools  where  the  0.5  correction  was  applied. 


NF 

DDur 

Bounds 

TAmp 

Bounds 

LN 

DAmp 

Bounds 

TAmp 

A2 

LF 

TAmp 

Bounds 

A2 

Med 

AllFor 

TAmp 

Bounds 

Examination  of  this  data  in  the  graphs  shows  that  all  the  concerned  cases 
have  large  CIs,  and  that  two  cases,  LN  DAmp  as  well  as  the  NF  DDur 
discussed  above,  appear  to  have  the  cue  function  displaced  into  a higher 
range. 


134 


8.4  The  Vowel  Analysis 

The  preceding  section  discussed  the  sensitivity  of  the  experiment 
design  to  either  low  numbers  of  subjects  or  low  numbers  of  tokens  in  an 
analyzed  cell.  The  analyses  of  cues  by  vowel  (see  Fig.  7.5)  deserve  special 
discussion  in  that  light. 

The  cue  analysis  by  subject  group  was  done  across  /i,  a,  u/ 
intentionally,  partly  to  span  any  possible  coarticulatory  effects  across  the 
vowel  space,  and  partly  to  avoid  relying  too  heavily  on  a single  token  in  each 
cell  of  the  factorial  matrix.  The  preceding  section  shows  that  even  pooling 
across  6 or  9 tokens  is  no  guarantee  of  a reliable  result. 

Viewed  in  contrast  to  the  regular  analyses,  the  vowel  analysis  might  as 
well  have  been  designed  to  be  deceptive  and  convincing  at  the  same  time.  By 
separating  the  analyses  by  vowel  and  pooling  across  groups,  it  forces  reliance 
on  a single  token  per  factorial  cell,  or  3 per  level  for  the  pooled  cells.  At  the 
same  time,  it  has  a high  number  of  responses  in  the  data  from  all  four  groups, 
so  the  CIs  are  fairly  compact.  Worst  of  all,  by  pooling  across  the  subject 
groups,  it  essentially  claims  that  any  intergroup  differences  are  trivial,  when 
detecting  such  differences  is  a prime  motivation  for  this  research. 

The  unreliability  of  the  vowel  analyses  is  attested  by  the  presence  in 
each  of  the  single-vowel  graphs  (Fig.  7.5)  of  at  least  one  cue  with  a markedly 
descending  slope,  presumably  caused  by  a token  or  two  with  an  unmoderated 
odd  effect.  The  AllVs  graph  does  not  suffer  from  that  particular  malady,  but  it 
still  pools  across  too  many  differences  to  be  useful. 

There  are  experiment  designs  imaginable  which  could  use  the 
elements  explored  in  this  research  to  determine  the  vowel  environment's 
effect  on  stop  voicing  cues.  This  design  is  not  one  of  them. 


135 


8.5  The  Better  Analyses 

Having  explored  which  data  and  analyses  are  untrustworthy  and  why, 
we  can  now  designate  the  complement  to  that  set,  those  which  have  no 
apparent  flaws  and  are  worthy  of  study  and  consideration. 

The  data  in  the  NN  graph  (Fig.  7.2)  are  probably  the  most  reliable.  The 
functions  seem  to  run  in  the  predicted  directions,  none  is  displaced  enough  to 
look  suspicious,  and  the  span  of  the  CIs  is  moderate  because  there  was  a large 
number  of  listeners. 

The  three  other  groups,  NF,  LF,  and  LN  (Fig.  7.2),  all  suffer  from  low 
numbers  of  subjects  and  large  CIs.  However,  note  that  the  cue  functions  for 
NF  and  LF  are  reasonably  similar  in  both  location  and  configuration  (except 
for  the  NF  DDur,  discussed  earlier  in  this  chapter).  Under  these  conditions  it 
seems  to  be  both  valid  and  useful  to  pool  across  the  two  groups  and  prefer  the 
data  in  the  AllFor  graph  (Fig.  7.3).  The  functions  should  be  better  estimates  of 
the  true  sensitivities,  and  the  CIs  are  definitely  smaller. 

Unfortunately,  across  NN  and  LN,  the  similarity  of  functions  only 
holds  for  TAmp  and  TDur.  These  two  can  be  singled  out  for  examination 
within  the  AllNat  graph,  but  such  selectively  pooling  of  data  is  a questionable 
practice  on  theoretical  grounds,  since  it  depends  on  the  supposition  that 
treatment  of  each  individual  cue  is  a separable  component  in  speech 
perception,  rather  than  an  aspect  of  some  systematic  superordinate  whole. 
Such  selective  pooling  should  therefore  be  avoided  until  justified  by  further 
research. 

The  Cue  graphs  (Fig.  7.4)  are  merely  a reorganization  of  the  Group  data, 
so  that  cross-group  comparisons  within  cue  are  more  visible.  Since  the  data 
from  the  NN  group  is  the  most  trustworthy,  the  NN  functions  in  the  Cue 


136 


graphs  must  serve  as  reference  against  which  the  other  functions  are  judged. 
The  warnings  about  displaced  functions  and  large  CIs  apply. 

8.6  Conclusions 

To  restate  a principle  of  conservative  interpretation,  we  will  not  accept 
any  hypothesis  of  true  effect  unless  the  appropriate  value  lies  outside  the 
confidence  interval  concerned.  Nor  will  we  accept  any  hypothesis  of  a 
difference  between  two  populations  unless  the  confidence  intervals  on  the  d' 
values  representing  those  populations  are  completely  separate  with  no 
overlap.1  Nor  will  we  accept  any  hypothesis  regarding  a cue  function  whose 
configuration,  location,  or  confidence  intervals  imply  that  the  data  behind 
them  are  of  dubious  validity  or  replicability.  This  conservative  stance  will 
allow  us  to  avoid  incorrectly  accepting  false  hypotheses  by  requiring  strong 
evidence  before  any  hypothesis  is  accepted.  The  consequence  of  this 
conservatism  is  the  underrecognition  of  true  hypotheses  for  which  evidence 
is  weak.2  This  consequence  is  not  critical,  considering  that  for  this  first 
application  of  the  new  measurement  system  proposed  here,  a better 
understanding  of  French  stop  consonant  voicing  is  only  a secondary  goal. 

The  primary  goal  of  this  research  is  to  demonstrate  the  feasibility  of  an 
autonomous  generalizable  measurement  technique  to  quantitatively  address 
certain  difficult  questions  of  cue  usage  differences  in  speech  perception. 
Within  this  particular  experiment,  there  are  two  questions  so  posed: 


*In  principle,  it  should  be  possible  to  estimate  the  probability  of  a difference 
for  those  populations  whose  CIs  overlap.  However,  in  light  of  the 
discussions  in  the  next  section,  that  would  be  of  very  limited  help  against  a 
problem  of  much  larger  scope,  and  ultimately  not  a worthwhile  strategy. 

2In  statistical  terms,  this  reduces  the  probability  of  Type  I error  at  the  expense 
of  increasing  the  probability  of  Type  II  error. 


137 


Are  there  intragroup  differences  in  the  use  of  different  cues? 

and 

Are  there  intergroup  differences  in  the  use  of  the  same  cue? 

A more  basic  question: 

Are  there  within-group  within-cue  use  differences  across  the  factorial 
levels  of  the  cue  modification? 

can  also  be  answered,  although  such  a simple  problem  does  not  justify  use  of 
this  complex  procedure  unless  accompanied  by  the  other  two  more  difficult 
questions. 

Effects 


The  data  in  this  experiment  (Fig.  7.2)  provide  sufficient  evidence  to 
support  the  hypothesis  that  naive  native  French  speakers  (the  NN  subject 
group)  use  both  the  TDur  cue  and  the  DAmp  cue  more  than  the  DDur  cue,  as 
those  cues  are  constructed  in  this  experiment.  In  other  words,  they  use 
consonant  hold  duration  in  perception  of  the  voicelessness  of  /t/  and  closure 
voicing  amplitude  in  perceiving  the  voiced-ness  of  /d/  differently  from,  and 
specifically  more  than,  consonant  hold  duration  in  the  perception  the  voiced- 
ness of  /d/.  The  TDur  d'  estimate  function  attains  a calculated  value  of  0.9825 
at  the  A6  stimulus  level,  and  the  A6  DAmp  estimate  attains  0.6267.  The  DDur 
estimate  is  -0.3230  at  the  same  A6  level,  and  its  confidence  interval  at  that 
setting  does  not  overlap  those  of  the  two  higher  cues.  The  paradoxical  nature 
of  this  DDur  cue  notwithstanding  (section  8.2  above),  the  difference  between 
DDur  (as  constructed)  and  both  TDur  and  TAmp  has  valid  statistical  support. 
The  calculated  values  are  somewhat  different,  but  the  same  cue  differences 
are  supported  in  the  All  Natives  analysis  (Fig.  7.3),  where  the  NN  results 
provide  the  majority  of  the  data. 


138 


The  only  within-cue  across-group  difference  with  apparent  support  (see 
Fig.  7.4)  is  the  DDur  difference  between  naive  foreign  speakers  (the  NF  group) 
and  naive  native  speakers  (NN),  but  that  support  is  disqualified,  since  it 
depends  crucially  on  the  NF  DDur  d'  estimate  function,  which  was  discussed 
above  (section  8.2)  as  suspicious,  and  probably  flawed  data. 

A within-cue  within-group  difference  across  levels  can  be  validated  in 
either  of  two  ways.  Stimuli  at  two  levels  of  the  same  cue  may  be  significantly 
different  from  one  another,  or  stimuli  at  one  level  (presumably  a higher  level 
of  modification)  may  be  significantly  different  from  a d'  of  0,  where  that  d' 
value  represents  no  sensitivity.  It  is  not  clear  which  of  these  criteria  is 
ultimately  preferable.  In  this  experiment,  the  first  criterion  is  less  often 
attained,  possibly  because  the  d'  estimate  functions  only  extend  up  to  A6.  For 
thoroughness,  effects  validated  by  both  criteria  are  reported.  First,  in  these 
four  functions,  the  A6  stimuli  are  significantly  higher  than  the  A2  stimuli: 

NN  DAmp,  NF  DAmp,  AllNat  DAmp,  AllFor  DAmp. 

Those  4 functions,  plus  the  following  6,  have  A6  stimuli  significantly  above  a 
d'  of  0: 

NN  TDur,  LN  TDur,  LF  DAmp,  AllNat  TDur,  AllFor  DDur, 

AllFor  TAmp.3 

More  functions  than  the  ten  listed  here  would  have  a qualifying  A6  if  the 
negative  d'  values  at  A2  are  symptoms  that  the  whole  function  is 
underestimated. 

Trends 

The  term  "trend"  refers  here  to  any  phenomena  in  the  data  which  are 
not  verifiable  within  the  parameters  of  this  investigation,  but  which  seem 


3NF  DDur  would  again  be  listed  were  it  not  likely  flawed  data. 


139 


either  strong  enough  or  interesting  enough  to  merit  comment  and  possibly 
further  investigation.  These  include  two  types  of  phenomena:  those 
hypotheses  tested  in  this  research  but  which  produced  evidence  too  weak  for 
conservative  acceptance,  and  those  phenomena  incidentally  revealed  by  this 
research  but  which  would  demand  a different  experiment  design  to  be  tested. 

The  most  important  trend  is  that  found  in  the  NN  group  (Fig.  7.2), 
where  the  stimulus  functions  cluster  within  a range  of  about  0.5  on  the  d' 
scale  at  the  A2  and  A4  level,  then  separate  to  a span  of  almost  1.5  at  the  A6 
level.4  This  is  interesting  for  at  least  two  reasons.  First,  notice  that  the  A6 
location  is  where  the  only  two  significantly  different  stimulus  functions  are 
differentiated.  Furthermore,  this  occurs  in  the  group  with  the  largest  number 
of  subjects,  so  it  is  likely  to  be  valid,  repeatable  data. 

Second,  recall  that  the  factorial  levels  are  constructed  relative  to  the 
surveyed  distribution  (presumed  normal)  of  the  cues  in  the  corpus  of  tokens. 
Recall  also  that  a normal  curve  is  considered  to  be  effectively  6 standard 
deviations  from  tail  to  tail.  It  can  be  inferred  that  the  displacement  of 
normally  distributed  stimuli  by  6 standard  deviations  will  definitely  take 
them  to  a region  outside  their  native  range.  This  would  not  necessarily  be 
the  case  at  the  A2  and  A4  steps.  If  the  sensory  or  cognitive  representation  of 
cues  included  a representation  of  their  distribution  in  nature,  this  is  a pattern 
one  could  plausibly  expect:  a distinct  differentiation  of  cue  importance  at  the 
borders  of  their  natural  occurrence. 

The  second  trend  to  remark  is  in  a sense  the  absence  of  the  first. 

Neither  of  the  normative  speaker  groups,  NF  or  LF  (Fig.  7.2),  nor  the  pool  of 
the  two  in  AllFor  (Fig.  7.3),  show  a flaring  of  the  cue  functions  at  the  A6  step. 

4We  cannot  be  surprised  to  find  the  same  effect  in  the  AllNat  group,  where 
the  NN  subjects  are  the  majority. 


140 


While  it  is  possible  that  this  is  a random  occurrence  tied  to  the  low  numbers 
of  subjects  in  the  two  groups,  it  is  worth  considering  whether  it  results  rather 
from  the  diversity  of  expected  cue  distributions  in  the  extremely  diverse  pool 
of  native  languages  included  in  the  two  speaker  groups.  It  may  be  that  each 
cue  function  has  an  "inflection"  point  characteristic  to  the  cue's  distribution 
in  the  language,  and  by  pooling  too  great  a diversity  of  languages  together,  we 
have  lost  any  hope  of  discerning  those  points. 

The  third  trend  is  a restatement  of  the  observation  in  section  8.1  that 
many  of  the  cue  functions  seem  to  start  with  the  A2  stimuli  in  the  region  of  d" 
= -0.25.  This  is  presumably  a problem  with  the  calibration  of  the  functions,  as 
mentioned  before,  and  will  require  another  investigation,  appropriately 
redesigned,  to  determine  whether  the  typical  departure  point  for  cue  use 
functions  is  in  the  vicinity  of  0,  or  rather  some  low  but  positive  value. 

The  last  trend  to  mention  here  requires  a close  reading  of  the  Cue 
graphs  (Fig.  7.4).  Note  within  the  TAmp  graph  that  the  functions  for  the  two 
native  groups,  NN  and  LN,  are  generally  low  and  barely  increase  from  A4  to 
A6.  The  NF  TAmp  function  is  situated  higher,  and  both  the  NF  and  LF 
functions  clearly  increase  throughout  their  range.  By  contrast,  in  the  TDur 
graph,  the  functions  for  the  two  native  groups  are  high  and  rising,  while  the 
LF  function  is  low  and  the  NF  function  is  essentially  flat  from  A4  to  A6. 

These  patterns  suggest  that  French  natives  put  more  weight  on  duration  than 
on  burst  amplitude  in  determining  the  voicelessness  of  /t/,  while  the  foreign 
groups  (perhaps  because  of  the  numerous  native  English  speakers)  rely  more 
heavily  on  the  burst  amplitude.  We  can  see  in  the  All  Natives  graph  (Fig.  7.3) 
that  the  TAmp  and  TDur  CIs  barely  overlap  at  the  A6  level,  so  the  two 
functions  might  well  be  provably  different  at  the  A8  level  or  with  responses 
from  a few  more  subjects.  Moreover,  comparing  the  TDur  cue  functions 


141 


across  the  All  Natives  and  All  Foreign  graphs,  we  find  the  same  situation: 
There  is  little  overlap  at  the  A6  level,  and  the  two  functions  might 
differentiate  with  more  subjects  or  if  extended  to  A8. 

Methodological  Issues 

As  discussed  at  the  beginning  of  this  section,  this  research  was 
conducted  to  test  the  ability  of  this  new  measurement  technique  to  render 
quantitative  answers  to  questions  about  intragroup  use  of  different  cues  and 
intergroup  use  of  the  same  cue.  A more  basic,  but  secondary  question  type 
concerned  intragroup,  same-cue  differences  across  settings  of  the  cue. 

We  have  seen  above  that  the  statistical  results  show  an  intragroup 
difference  between  one  cue  (DDur)  and  two  others  (DAmp  and  TDur),  and  we 
have  seen  several  differences  across  settings  of  the  same  cue.  We  have  also 
seen  a trend  that  has  potential  for  showing  intergroup  differences  in  use  of  a 
single  cue  (in  TDur  and  possibly  TAmp). 

We  can  therefore  conclude  that  this  new  measurement  technique  has 
responded  successfully  to  2 of  the  3 types  of  questions  which  it  needs  to 
address,  and  we  can  conclude  more  generally  that  it  successfully  renders 
quantitative  measures  of  the  use  (or  importance,  or  weight)  of  acoustic  cues 
in  speech. 


8.7  Recommendations  for  Redesign 

Two  fundamental  improvements  would  make  this  line  of  research 
fully  productive.  The  first  involves  developing  a more  reliable  base  rate,  and 
the  second  finding  a better  method  for  determining  statistically  supportable 
hypotheses. 


142 


The  Base  Rate 

The  d'  values  in  this  study  were  all  calculated  with  a method 
analogous  to  cumulative  d'  (Macmillan  & Creelman,  1991).  The  method  uses 
responses  to  unmodified  boundary  tokens  to  supply  the  false  alarm  rate 
which  is  subtracted  (after  z-transforms)  from  hit  rates  of  all  the  successive 
stimuli.  If  that  single  false  alarm  rate  is  unreliable,  so  is  the  entire  cue 
sensitivity  function  calculated  from  it. 

In  fact,  the  boundary  token  used  here  do  deliver  an  unreliable  response 
rate,  for  three  reasons.  First,  they  are  "boundary"  tokens:  they  have  the  least 
original  contrast  with  the  opposing  phoneme  and  are  presumably  more  likely 
than  other  tokens  to  be  mistaken  for  it.  They  were  used  to  supply  the  base 
rate  partly  to  ensure  that  any  effect  claimed  for  the  modified  tokens  would 
have  been  proven  greater  than  that  of  any  possible  natural  tokens. 
Unfortunately,  that  certainty  seems  to  have  been  achieved  at  the  expense  of 
accuracy,  since  the  apparent  result  of  using  such  skewed  tokens  was  the 
shifting  of  the  entire  cue  sensitivity  function  to  a lower  range  of  values. 

Second,  the  boundary  tokens  are  too  few  in  number  in  this  design. 
Recall  that  each  cue  for  each  word  class  was  instantiated  by  a 3x3  factorial 
array  plus  2 boundary  tokens.  The  consequence  is  that  regardless  of  how 
pooling  is  organized,  the  false  alarm  rate  comprises  responses  to  only  two- 
thirds  as  many  tokens  as  do  the  hit  rates  for  each  factorial  level,  and  thereby 
only  two-thirds  as  resistant  to  unexpectedly  extreme  token  response  rates.  As 
was  seen  with  the  displaced  NF  DDur  cue  function,  it  would  actually  be 
preferable  to  sacrifice  some  accuracy  at  the  factorial  levels  in  order  to  attain  a 
reliable  base  rate. 


143 


Third,  when  the  tokens  are  few  in  number,  the  design  will  always  be 
susceptible  to  unwanted  token  phenomena  affecting  the  various  response 
rates  differently.  This  is  unavoidable  when,  in  order  to  use  natural  stimuli, 
the  experimentally  uncontrolled  cues  in  the  tokens  differ  across  stimulus 
conditions,  as  they  do  here. 

A fairly  simple  change  in  experiment  design  would  remedy  all  these 
various  aspects  of  the  problem.  Recall  that  single-use  tokens  were  used  in 
this  study  to  ensure  that  the  nonexperimental  cues  were  able  to  vary  through 
their  normal  range.  Consider  the  possibility  of  using  the  same  natural  token 
to  create  all  the  stimulus  conditions,  such  as  the  AO,  A2,  A4,  and  A6  in  this 
study.  The  stimuli  could  then  be  presented  so  that  each  subject  heard  only 
one  of  the  set  of  stimuli  derived  from  the  same  source.  It  would  also  be 
possible  to  arrange  the  tests  so  that  each  subject  heard  the  AO  and  one  other 
from  the  set,  but  this  would  require  enough  stimuli  per  test  session  so  the 
two  members  of  a pair  could  be  well  separated,  to  ensure  the  subjects  are  not 
responding  to  the  second  based  on  memory  of  the  first.  Either  way,  there 
would  be  an  equal  or  greater  number  of  tokens  and  responses  to  the  AO  base 
condition  as  there  would  to  the  other  stimulus  conditions.  Furthermore,  the 
same  experimentally  uncontrolled  stimulus  effects  would  occur  in  all 
stimulus  conditions  (AO,  . . . , A6),  so  they  would  affect  both  the  false  alarm 
and  hit  rate  and  thereby  balance  out.  In  short,  this  kind  of  simple  redesign 
should  furnish  a much  more  reliable  and  accurate  baseline  for  the  calculation 
of  the  cue  sensitivity  functions. 

Statistical  Models 


The  problems  involved  in  extracting  statistically  valid  conclusions 
from  a design  this  complex  have  been  discussed  above  in  section  7.5.  Clearly, 


144 


the  confidence  interval  strategy  employed  here  is  inadequate  for  the  task. 
There  are  no  simple  solutions,  but  let  us  reiterate  that  there  are  two 
promising  possibilities  for  improvement:  the  "gradient  in  proportion" 
discussed  by  Fleiss  (1980),  and  the  logistic  or  log-linear  models  for  which 
Fienberg  (1977)  can  serve  as  an  introduction. 

Further  Possible  Improvements 

There  may  be  many  other  ways  to  improve  the  design  applied  in  this 
study,  but  four  deserve  mention. 

First,  section  8.6  mentioned  the  possibility  of  an  "inflection"  point  in 
the  neighborhood  of  A6.  This  would  be  a point  where  the  sensitivity  for 
several  cues  changes  slope  in  response  to  stimuli  which  were  recognizably 
outside  their  natural  distribution.  To  test  that  possibility,  it  would  be 
advisable  to  continue  the  stimulus  series  beyond  A6  to  A8  and  possibly  A10, 
and  also  to  test  more  cues. 

Second,  it  should  also  be  interesting  to  test  the  inflection  point 
hypothesis  using  L2  listeners  from  a specific  LI  whose  surveyed  cue 
distribution  is  known  to  be  "shorter,"  in  the  sense  that  their  inflection  point 
should  come  at,  say,  A4  in  the  L2.  We  might,  in  such  a case,  be  able  to  find  a 
difference  in  the  inflection  point  among  listeners  differing  in  age  of 
acquisition  or  in  acquired  skill  level.  This  would  of  course  depend  on  the 
prior  determination  of  better  statistical  tests. 

Third,  it  would  be  possible  to  lower  overall  performance  and  avoid  the 
ceiling  effects  that  were  a problem  here  by  using  the  classic  techniques  from 
experimental  psychology  of  signal  degradation  and  divided  attention.  Care 
should  be  exercised  in  using  these  techniques,  however,  because  of  the 
possibility  they  could  affect  cues  differentially.  For  instance,  adding  white 


145 


noise  would  very  likely  change  perceptual  use  of  any  spectral  noise  cues  in 
fricatives,  affricates,  and  bursts,  but  might  not  affect  any  fundamental 
frequency  perturbation  cues  normally  involved  in  the  same  contrast.  This 
would  naturally  skew  any  results  from  experiments  using  the  technique. 

Last,  it  is  of  course  always  important  to  get  an  adequate  amount  of  data. 
In  this  design,  the  NN  group  stood  out  as  having  the  smallest  CIs  partly 
because  of  the  comparatively  large  number  of  subjects  in  that  group,  so  in 
retrospect,  the  other  groups  (foreigners  and  linguists)  would  have  benefited 
from  more  subjects.  Accordingly,  one  of  the  advantages  to  be  hoped  for  from 
improved  statistical  analysis  techniques  is  a way  of  calculating  the  number  of 
subjects  needed  before  the  experiment  is  run. 


CHAPTER  9 

CONCLUSIONS  AND  PROSPECTS 


The  results  of  the  experiment  having  been  analyzed  in  the  previous 
chapter,  this  final  chapter  considers  the  interest  this  project  may  hold  for 
similar  work  and  related  fields. 

9.1  Innovations 

The  main  goal  of  this  research  is  to  show  the  feasibility  of  an 
autonomous  generalizable  measurement  technique  which  can  apply 
quantitative  methods  to  questions  of  cue  usage  differences  in  speech 
perception.  This  measurement  can  be  expected  to  apply  in  many  situations, 
but  it  is  crucially  needed  in  cross-language  comparisons  where  the  expected 
distributions  of  the  cues  may  differ  and  where  none  of  the  extant  methods  for 
perceptual  research  are  appropriate. 

The  strategy  this  research  has  employed  to  try  to  satisfy  this  need  is  as 
follows: 

First,  a corpus  of  the  cues  to  be  studied  is  recorded  and  measured,  in 
order  to  determine  the  statistical  distribution  of  the  cues  in  the  language 
under  study.  Speakers  (or  rather  listeners)  of  the  language  are  presumed  to 
have,  as  part  of  their  internal  representation,  an  expected  measurement  and 
expected  accuracy  for  each  cue.  Since  those  expectations  are  presumably 
arrived  at  in  memory  through  the  perceptual  equivalent  of  a survey,  the 
specification  of  a cue's  distribution  (typically  the  mean  and  standard 


146 


147 


deviation)  as  surveyed  in  an  appropriately  constructed  corpus  is  the  best 
available  estimate  of  those  internal  expectations. 

Second,  the  surveyed  standard  deviations  of  the  cues  are  used  to 
generate  a stimulus  series  at  a regular  pattern  of  intervals  common  to  all  the 
cues  under  investigation,  like  the  2,  4,  and  6 standard  deviations  of  cue 
modification  used  herein.  The  use  of  the  standard  deviation  to  define  the 
step  size  in  the  stimulus  series  has  the  effect  of  normalizing  the  signal 
strength  across  cues,  yet  it  accomplishes  this  by  using  cue  internal  criteria. 

This  is  the  crucial  step  which  allows  an  analogy  to  be  struck  across  any  desired 
set  of  linguistic  conditions,  including  across  languages. 

The  stimulus  series  must  include  a large  cohort  of  natural  tokens,  so 
that  the  many  cues  not  under  experimental  control  will  vary  according  to 
their  own  natural  distributions.  The  alternate  method,  creating  a stimulus 
series  by  successive  modification  of  a single  prototype,  fixes  all  the 
uncontrolled  cues  at  values  unknown  to  the  researcher,  and  skews  the 
resulting  perception  in  ways  the  research  has  no  way  to  estimate. 

Finally,  the  stimuli  are  played  to  appropriate  listeners  in  perception 
tests,  and  the  resulting  perceptual  functions  are  compared  across 
experimental  conditions  at  the  points  of  analogy  designed  into  the  stimulus 
array  during  the  second  step,  above.  It  is  important  that  this  comparison 
respect  the  potentially  unique  situation  of  each  cue  (as  the  normalization 
process  did  at  stimulus  generation),  since,  for  instance,  the  same  cue  could 
conceivably  be  used  more  in  one  language  than  another,  yet  have  a weaker 
effect  because  of  differences  in  density  in  the  phonological  or  phonetic  space 
in  which  the  cue  operates.  This  experiment  has  tried  to  address  that  concern 
by  finding  a way  to  perform  that  comparison  using  d',  the  sensitivity 
measurement  from  signal  detection  theory.  The  advantage  of  d'  is  that  it 


148 


abstracts  away  from  the  raw  performance  data  (which  are  known  to  vary 
across  languages  and  conditions  in  linguistically  trivial  and  phonetically 
unenlightening  ways)  to  reveal  the  underlying  differences  in  perceptual 
effect.  It  does  this  on  an  abstract  scale  that  allows  comparison  across  radically 
different  linguistic  conditions. 

Taken  as  a whole,  this  procedure  describes  a measurement  system 
which  is  quantitative,  independent,  abstract,  and  autonomous,  in  exact  accord 
with  the  desiderata  of  section  1.1.  If  appropriately  applied,  it  should  allow 
comparison  of  cue  importance  across  languages,  across  phonemes,  across 
environments,  in  short  across  any  phonological  conditions. 

9.2  Directions 

The  experiment  described  here  was  the  first  attempt  to  apply  this 
complex  measurement  scheme.  As  such,  its  importance  is  not  primarily  in 
the  findings  it  supports  concerning  the  particular  language  materials  studied, 
in  this  case,  four  cues  to  the  intervocalic  /t~d/  voicing  decision  in  French.  Its 
importance  is  rather  the  light  that  it  shines  on  the  methodology  whereby  the 
measurement  system  is  applied,  and  the  experiment's  findings  in  that  regard 
are  quite  valuable. 

The  experiment  has  shown  that: 

1)  The  extremely  low  error  rates  in  perception  of  natural  speech  leave 
this  particular  design  vulnerable  to  ceiling  effects  because  of  the  low  number 
of  tokens  pooled  into  each  stimulus  condition.  This  is  particularly  dangerous 
to  the  base  error  rate  for  unmodified  (AO)  tokens,  since  that  rate  is  used  in  the 
calculation  of  d'  values  for  the  entire  modified  stimulus  series.  Section  8.7 
detail  several  possible  design  changes  to  counter  this  problem. 


149 


2)  The  development  of  a more  probative  statistical  methodology  is 
crucial  for  making  the  best  possible  use  of  the  data  generated  to  develop  valid 
and  replicable  results.  The  confidence  interval  strategy  employed  here  was 
the  best  that  could  be  implemented  under  the  circumstances.  But  there  are  a 
number  of  other  strategies  worth  investigating  which  could  do  better  at 
modeling  the  data  and  revealing  statistically  significant  effects. 

3)  The  technique  of  repeating  the  first  stimuli  at  the  end  of  the  test  was 
not  needed  to  ensure  the  validity  of  the  subjects'  responses  to  those  tokens. 
Familiarization  with  the  speaker's  voice  plus  a short  practice  test  adequately 
prepares  the  subjects  for  their  response  task.  The  loss  of  "event 
independence"  which  results  from  such  repetition  need  not  further 
complicate  the  statistical  model  sought  above. 

4)  A pool  of  second  language  speakers  with  no  restrictions  on  their 
native  language  may  be  too  heterogeneous  to  generate  data  that  resembles  a 
natural  language  s use  of  cues.  If  cue  perception  functions  have  a natural 
"inflection  point"  where  the  stimuli  escape  the  cue's  natural  distribution,  L2 
perception  by  a group  unrestricted  with  regard  to  LI  would  mask  that 
inflection  to  the  extent  of  the  difference  in  the  cue’s  distribution  in  the 
various  Lis.  If  the  group  were  restricted  to  a particular  LI,  the  inflection  point 
might  be  apparent,  but  shifted  from  its  expected  location  in  the  L2's  native 
speakers,  with  that  shift  predictable  from  the  cue  distribution  in  the 
designated  LI. 

These  methodological  findings  all  point  to  specific,  implementable 
improvements  that  can  be  made  in  the  design  of  future  experiments  using 
this  measurement  system.  It  is  those  future  experiments  which  will  put  us 
on  track  to  a general  understanding  of  the  use  of  cues  in  speech  perception. 


150 


9.3  Significance 

The  significance  of  this  project  lies  in  its  potential  for  strengthening 
some  of  the  less  developed  lines  of  reasoning  encompassed  by  Jakobson,  Fant, 
and  Halle  in  Preliminaries  to  Speech  Analysis  (1952/1969).  That  work  was  a 
milestone  in  many  language  related  fields  by  its  attempt  to  define  a feature 
system  used  to  encode  and  decode  all  possible  contrasts  in  all  human 
languages  through  the  acoustic  speech  signal  as  the  primary  carrier  of 
linguistic  information.  Their  preface  makes  clear  that  the  authors  intended 
their  features  as  a unifying  base  for  discussion  and  experimentation  for  all 
researchers  in  language  and  speech. 

Their  work  certainly  helped  inspire  discussion  and  experimentation, 
but  their  unification  of  such  a broad  spectrum  of  tenets  was  never  accepted 
among  language  researchers  as  a whole  for  a number  of  reasons.  For 
instance,  there  remains  an  ongoing  debate  over  the  nature  of  the  basic 
psychological  representations  (or  units)  of  the  sounds  of  language.  They  may 
be  articulatory,  auditory-acoustic,  or  abstract-linguistic,  and  good  arguments 
are  presented  for  all  three  viewpoints.  Opinions  are  also  extremely  diverse 
over  how  such  units  can  explain  the  observed  facts  of  language  acquisition, 
diachronic  change,  and  biological  phenomena  such  as  aphasia.  And  of  course 
there  is  a vast  literature  on  what  is  now  generally  considered  the  hopeless 
search  for  useful  acoustic  invariance  in  the  speech  signal. 

The  speech  and  phonetics  communities  are  acutely  aware  of  the 
diversity  of  opinion  on  these  matters.  Their  reaction  to  it  can  reasonably  be 
viewed  as  an  increased  reliance  on  one  of  the  few  deterministic  relationships 
available,  that  of  the  source-filter  theory  of  speech  production  (see  especially 
Fant,  1960).  This  theory  states  what  acoustical  output  will  result  from  a given 


151 


configuration  of  the  speech  production  mechanism.  Moreover,  since  the 
mapping  from  articulation  to  acoustics  is  unique  (a  given  configuration 
produces  exactly  one  specified  sound)  while  the  mapping  in  the  opposite 
direction  is  not  (some  sounds  can  be  produced  by  quite  different  articulatory 
configurations),  researchers  have  tended  to  start  from  the  articulation,  in 
order  to  avoid  the  problem  of  deciphering  the  acoustic  signal.  Much 
important  research  has  been  done  on  articulation  in  its  own  right,  such  as 
articulatory  modeling  (e.g.,  Mermelstein,  1973)  and  task-dynamics  (Kelso, 
Saltzman,  & Tuller,  1986).  A great  deal  of  important  work  with  an 
articulatory  basis  has  also  been  done  on  speech  perception  and  phonology, 
where  the  a priori  utility  of  an  articulatory  starting  point  is  less  evident. 
Examples  include  motor  theory  (Liberman  & Mattingly,  1985),  direct 
perception  (Fowler,  1986;  Best,  1995),  the  numerous  phonologies  whose 
feature  set  is  descended  from  that  founded  in  Sound  Pattern  of  English 
(Chomsky  & Halle,  1968),  and  articulatory  phonology  (Browman  & Goldstein, 
1986  & 1990). 

A different  line  of  inquiry  implicit  in  the  Preliminaries  program 
concerns  the  decoding  of  feature  contrasts  from  the  acoustic  signal.  Important 
work  has  been  done  on  this  theme,  notably  by  engineers  working  on 
automatic  speech  recognition,  by  psychologists  investigating  general  problems 
of  perception  or  cognition,  by  phoneticians  documenting  particular  language 
phenomena,  and  by  combinations  of  these  scientists  in  collaboration.  This 
theme,  however  has  suffered  from  two  important  weaknesses.  First,  of  the 
three  research  projects  most  centrally  related  to  it,  the  invariance  and  feature 
detector  hypotheses  are  generally  acknowledged  to  have  failed,  and  the  hopes 
for  categorical  perception  as  a key  to  understanding  speech  perception  have 
been  severely  curtailed.  Second,  in  contrast  to  the  articulatory  based  efforts 


152 


noted  above,  the  acoustic  decoding  theme  has  suffered  from  the  lack  of  a 
common,  acknowledged  organizing  principle. 

This  dissertation  presents  an  autonomous,  language  independent 
method  for  quantifying  the  language  dependent  perceptual  weight  of  acoustic 
cues,  separate  from  the  language  specified  setting  of  the  cues.  This  method 
can  be  an  important  new  tool  for  inquiry  in  the  acoustic  decoding  theme,  and 
it  has  the  potential  for  starting  an  important  new  line  of  research  on  the 
underlying  commonalities  in  the  managed  perceptual  exploitation  of  cue 
variation.  If  the  results  from  such  research  prove  promising,  this  method  (or 
some  more  refined  successor)  might  serve  as  the  common  organizing 
principle  for  the  acoustic  decoding  theme,  with  psychoacousticians  testing 
discriminability  limits  on  cue  setting  proximity,  cognitive  psychologists 
investigating  cue  memory  and  processing  patterns,  engineers  developing 
automatic  cue  recognizers  for  the  stronger  cues,  and  of  course  linguistic 
phoneticians  documenting  cue  behavior  across  languages,  during  first  and 
second  language  acquisition,  across  sociolects,  and  through  historical  sound 
changes. 

In  fact,  this  method  could  even  be  adapted  for  use  within  the 
articulatory  theme  if,  instead  of  cues,  production  control  parameters  and  their 
internal  distributions  drive  an  articulatory  model  to  generate  the  perceptual 
stimulus  series.  It  is  premature,  though,  to  imagine  that  this  method  might 
lead  to  a unified  account  of  the  management  of  phonetic  variability  in  both 
perception  and  production. 


153 


9.4  Implications 


Cues  and  their  perception,  whether  acknowledged  or  not,  are  at  the 
center  of  many  current  topics  in  phonetics  and  speech-related  fields.  Consider 
the  following  topics,  and  the  cue-centered  questions  they  potentially  address: 

Phonology:  Can  neutralization  (e.g.,  of  word  final  stop  consonant  voice 
contrasts)  be  considered  total  if  minor,  yet  consistent  acoustic 
differences  remain  between  the  two  neutralized  phones?  (e.g.. 

Port  & Crawford,  1989;  Faber  & Di  Paolo,  1995) 

Historical  linguistics:  What  are  the  triggers  of  common  conditioned 
sound  changes  (e.g.,  palatalization  of  apical  stops  before  high  front 
vowels),  or  conversely,  what  prevents  them  from  automatically 
triggering?  (Weinreich,  Labov,  & Herzog,  1968) 

1st  language  acquisition:  How  do  the  acoustic  properties  of  the  child's 
linguistic  environment  guide  the  development  of  the  child's 
phonology?  (e.g.,  Beckman,  1993) 

2nd  language  acquisition:  What  kind  of  listening  practice  is  most 
advisable  for  second  language  pedagogy?  (e.g.,  Logan,  Lively,  & 
Pisoni,  1991)  Are  its  benefits  limited  to  speech  understanding,  or 
does  it  also  result  in  more  native-like  speech  production?  (e.g., 
Flege,  1991;  Landahl  & Ziolkowski,  1995) 

Sociolinguistics:  Do  pidgin  and  creole  languages  simplify  the  phonetic 
and  phonological  structure  of  the  lexifier  language  to  fit 
unmarked  defaults  supplied  by  an  innate  "bioprogram,"  or  do 
they  recast  only  the  structures  disallowed  by  the  substrate 
languages?  (e.g.,  Muysken  & Smith,  1986) 

Speech  technology:  Which  acoustic  properties  give  the  most  robust 
distinction  between  different  phonetic  segments  in  natural 
speech?  How  should  they  be  manipulated  to  make  synthesis 
sound  natural?,  (e.g..  Section  V,  "Speech  Assessment,"  of  Bailly, 
Benoit,  & Sawallis,  1992,  and  especially  the  article  by  Fourcin) 

Speech  perception:  How  do  listeners  map  the  continuous  speech  signal 
onto  the  discrete  units  of  the  lexicon,  and  what  are  those  units? 

Are  there  intermediate  stages  in  the  process,  each  treating 
different  levels  of  unit?  (e.g.  Miller  & Eimas,  1995) 


154 


All  these  fields  are  currently  limited  to  educated  guesswork  when  it 
comes  to  quantification  of  the  importance  of  particular  cues  in  the  context 
studied.  Their  conclusions  are  thus  constrained  by  the  appropriateness  of 
those  guesses,  which  is  unverifiable.  Accordingly,  they  could  benefit  from 
such  quantification  by  stronger  conclusions,  better  directed  research,  or  both. 
For  instance: 

Final  devoicing  cannot  be  considered  true  neutralization  until  several 
proven  preconsonantal  cues  to  intervocalic  voicing  have  been 
proven  ineffective  in  final  position. 

If  creoles  reveal  traits  of  an  innate  bioprogram  for  language,  then 
English  based  creoles  worldwide  should  be  found  to  have 
eliminated  the  exaggerated  emphasis  found  in  standard  Englishes 
on  use  of  vowel  length  as  a cue  to  consonantal  voice. 

Naturalness  of  speech  synthesis  should  be  most  greatly  improved  by 
targeting  those  traits  which  are  most  perceptually  important  for 
listeners,  and  those  traits  can  be  expected  to  vary  across  languages. 


The  quantitative  modeling  of  cue  importance  proposed  in  this  study 
thus  holds  distinct  potential  for  progress  in  each  of  the  above  mentioned 
fields.  There  is  then  further  potential  if  all  these  fields  illuminate  one 
another  mutually  through  intersecting  work  on  cues. 


APPENDIX  A 
SUBJECT  MATERIALS 


This  appendix  contains  the  materials  that  were  presented  to  the 
subjects  in  order  to  take  the  perception  tests.  These  materials  consisted  of  a 
set  of  instructions  with  a sample  test,  plus  the  main  answer  sheet,  plus  an 
interview  sheet  for  the  subject's  background  information.  The  answer  sheet 
and  the  sample  test  exist  in  two  versions  in  order  to  balance  any  possible 
preference  in  the  order  of  the  answers.  Finally,  there  is  the  French  text  of  the 
fable  'La  Bise  et  le  soleil"  (IP A,  1949),  which  was  recorded  by  the  speaker  and 
played  to  the  subjects  before  in  order  to  accommodate  the  subjects  to  the 
speaker's  voice. 


155 


156 


Instructions 


Tout  d abord,  je  vous  remercie  de  participer  a cette  recherche.  C'est  en  partie  grace  a des 
volontaires  comme  vous  que  la  recherche  scientifique  continue  de  progresser.  Nous  vous 
sommes  done  reconnaissants  de  votre  aide. 


A propos  de  vous 

Cette  etude  porte  sur  le  contenu  acoustique  des  sons  de  la  parole,  et  sur  ('utilisation  que  nous 
en  faisons  quand  nous  ecoutons  parler  les  autres.  Des  personnes  originaires  de  differentes 
regions  de  France  ou  du  monde  parlent  differemment,  et  Ton  peut  done  supposer  qu’elles 
ecoutent  differemment.  Pour  cette  raison,  apres  le  test,  je  vous  poserai  diverses  questions 
(anodines)  concernant  votre  experience  linguistique,  mais  le  plus  important  est  votre  ou'ie.  Si  vous 
avez  deja  eu  des  problemes  d’oui'e,  veuillez  me  I'indiquer  des  a present. 


Le  test,  c'est  quoi  ? 


Vous  allez  ecouter  une  liste  composee  de  “mots”  que  voici : 

ditite  datate  doutoute 

didite  dadate  doudoute 

Ces  mots  vous  seront  repetes  dans  un  ordre  aleatoire,  en  nombre  egal. 

Vous  remarquerez  que  les  mots  d’une  paire  different  uniquement  par  la  consonne  du  milieu, 
soit  T soit  D.  II  vous  est  demande,  pour  chaque  mot,  d’indiquer  ce  que  vous  avez  entendu  : D ou  T 

L’interet  de  cette  experience  tient  en  les  modifications  que  j’ai  apportees  a la  plupart  des  mots, 
ainsi  vous  pouvez  entendre  T lorsque  Ton  a prononce  D,  et  entendre  D lorsque  Ton  a prononce 

Vos  reponses,  ainsi  que  celles  des  autres  sujets,  me  permettront  de  mesurer  I’ampleur  de  ces 
modifications.  Ainsi,  s’il  y a parfois  des  sons  difficiles  a decider,  ne  cherchez  pas  la  reponse 
correcte”,  essayez  simplement  d’indiquer  votre  premiere  impression.  En  effet,  il  n’y  a pas  de 
bonne  ou  de  mauvaise  reponse,  vos  reponses  m’aideront  a distinguer  les  bonnes  et  mauvaises 
modifications  acoustiques. 


157 


Comment  repondre 


Vous  allez  entendre  chaque  mot  prononce  une  seule  fois, 
puis  vous  devrez  repondre  rapidement.  Pour  indiquer  votre 
reponse,  vous  avez  une  feuille  numerotee  avec  les  deux  cas 
possibles  portes  apres  chaque  numero. 

Encerclez  la  lettre  qui  correspond  au  son  que  vous  avez 
entendu.  Si  vous  vous  trompez,  barrez  votre  erreur,  et  mettez  un 
cercle  autour  de  la  reponse  finalement  choisie.  Vous  trouverez 
des  exemples  de  reponses  ci-contre. 

Pour  eviter  les  decalages,  vous  entendrez  un  bip  juste  avant 
les  numeros  10,  20,  30...  Sur  la  feuille  de  reponse,  le  bip  est 
rappele  par  un  trait  entre  les  numeros  9 et  10,  19  et  20,  ... 
Utilisez  le  bip  pour  vous  repositionner  si  vos  reponses  sont 
decalees. 


Vous 

Vous 

entendez  : 

marquez 

ditite 

7)©D 

doudoute 

8)  T@ 

didite 

9)T@ 

BIP 

datate 

10)©D 

dadate 

'# 

ditite 

12)©D 

Un  peu  d'entrainement 

Maintenant,  nous  allons  faire  un  petit  essai,  pour  vous  y habituer,  mais  en  utilisant  les  mots  : 
bipip  bapap  boupoup 

bibip  babap  bouboup 

vous  avez  done  a identifier  soit  le  B ou  le  P,  et  a entourer  la  lettre  correspondent  a votre  reponse. 
Quand  vous  serez  pret,  faites-moi  signe,  et  si  vous  avez  des  questions  vous  pouvez  me  les  poser 
maintenant. 


A vos  marques,  pret,  ... 

La  voix  que  vous  avez  entendue  est  celle  du  test  reel. 

Vous  allez  maintenant  faire  le  test  reel,  mais  vous  allez  entendre  d’abord  une  petite  histoire, 
pour  vous  aider  a vous  habituer  a cette  voix. 

Letest  commence  immediatement  apres  cette  histoire,  et  le  premier  mot  est  precede  par  un 
bip.  Faites-moi  signe  pour  mettre  en  marche  le  magnetophone. 


158 


Comment  repondre 


Vous  allez  entendre  chaque  mot  prononce  une  seule  fois, 
puis  vous  devrez  repondre  rapidement.  Pour  indiquer  votre 
reponse,  vous  avez  une  feuille  numerotee  avec  les  deux  cas 
possibles  portes  apres  chaque  numero. 

Encerclez  la  lettre  qui  correspond  au  son  que  vous  avez 
entendu.  Si  vous  vous  trompez,  barrez  votre  erreur,  et  mettez  un 
cercle  autour  de  la  reponse  finalement  choisie.  Vous  trouverez 
des  exemples  de  reponses  ci-contre. 

Pour  eviter  les  decalages,  vous  entendrez  un  bip  juste  avant 
les  numeros  10,  20,  30...  Sur  la  feuille  de  reponse,  le  bip  est 
rappele  par  un  trait  entre  les  numeros  9 et  10,  19  et  20,  ... 
Utilisez  le  bip  pour  vous  repositionner  si  vos  reponses  sont 
decalees. 


Vous 

Vous 

entendez  : 

marquez 

didite 

7)§T 

doutoute 

8)  D© 

ditite 

9)  D© 

BIP 

dadate 

10)@T 

datate 

didite 

12)@T 

Un  peu  d'entrainement 

Maintenant,  nous  allons  faire  un  petit  essai,  pour  vous  y habituer,  mais  en  utilisant  les  mots : 
bipip  bapap  boupoup 

bibip  babap  bouboup 

vous  avez  done  a identifier  soit  le  B ou  le  P,  et  a entourer  la  lettre  correspondant  a votre  reponse. 
Quand  vous  serez  pret,  faites-moi  signe,  et  si  vous  avez  des  questions  vous  pouvez  me  les  poser 
maintenant. 


A vos  marques,  pret,  ... 

La  voix  que  vous  avez  entendue  est  celle  du  test  reel. 

Vous  allez  maintenant  faire  le  test  reel,  mais  vous  allez  entendre  d’abord  une  petite  histoire, 
pour  vous  aider  a vous  habituer  a cette  voix. 

Letest  commence  immediatement  apres  cette  histoire,  et  le  premier  mot  est  precede  par  un 
bip.  Faites-moi  signe  pour  mettre  en  marche  le  magnetophone. 


159 


Instructions 


First  of  all,  I thank  you  for  participating  in  this  research.  It  is  partly  thanks  to  volunteers  like  you 
that  scientific  research  continues  to  progress.  We  are  grateful  for  your  help. 


About  you 

This  study  concerns  the  acoustic  content  of  speech  sounds,  and  on  our  use  of  it  when  we  listen 
to  other  speaking.  People  from  different  regions  of  France  or  of  the  world  speak  differently,  and  we 
can  therefore  suppose  that  they  listen  differently.  For  this  reason,  after  the  test  I will  ask  you  various 
(harmless)  questions  about  your  linguistic  experience,  but  the  most  important  is  your  hearing,  f 
you  have  ever  had  any  hearing  problems,  please  tell  me  right  away. 


What  is  this  test? 


You  are  going  to  hear  a list  of  the  following  words: 

ditite  datate  doutoute 

did  i te  dadate  doudoute 

These  words  will  be  repeated  to  you  randomly,  in  equal  numbers. 

You  will  notice  that  the  word  in  a pair  are  only  different  at  the  middle  consonant,  either  a T or  a 
D.  You  are  asked  to  indicate,  for  each  word,  to  indicate  which  you  have  heard:  D or  T. 

The  point  of  this  experiment  lies  in  the  changes  I have  made  to  most  of  the  words,  so  that  you 
may  hear  a T when  a D was  said,  or  a D when  a T was  said. 

Your  answers,  with  those  of  the  other  subjects,  will  allow  me  to  measure  the  magnitude  of 
these  changes.  So,  if  there  are  sometimes  sounds  which  are  difficult  to  decide  about,  don't  look 
for  the  "correct"  answer,  just  try  to  indicate  your  first  impression.  Actually,  there  are  no  right  or  wrong 
answers  — your  answers  will  help  me  distinguish  between  good  and  bad  acoustic  modifications. 


160 


How  to  answer 


You  will  hear  each  word  spoken  one  single  time,  then  you 
must  respond  quickly.  To  indicate  your  answer,  you  have  an 
numbered  sheet  with  the  two  possible  answers  by  each  number. 

Circle  the  letter  which  corresponds  to  the  sound  you  heard,  f 
you  make  a mistake,  put  a line  through  your  error  and  circle  the 
answer  you  finally  decide  on.  You  will  find  examples  opposite. 

To  avoid  gaps,  you  will  hear  a beep  just  before  the  numbers 
10,  20,  30,  ...  On  the  answer  sheet,  the  beep  is  indicated  by  a 
line  between  9 and  10, 19  and  20, ...  Use  the  beep  to  reposition 
yourself  if  there  is  a gap  in  your  answers. 


You 

You 

hear: 

mark: 

ditite 

7)©D 

doudoute 

8)T@ 

didite 

9)  T® 

BEEP 

datate 

10)©D 

dadate 

ditite 

12)©D 

A little  practice 

Now  we  will  do  a little  sample  test  to  get  you  used  to  it,  but  using  the  words: 

bipip  bapap  boupoup 

bibip  babap  bouboup 

so  you  have  to  identify  either  a B or  a P,  and  circle  the  letter  corresponding  to  your  answer.  When 
you  are  ready,  let  me  know,  and  if  you  have  any  questions  you  can  ask  me  now. 


On  your  mark,  get  set,  ... 

The  voice  you  heard  is  the  one  from  the  real  test. 

Now  you  will  start  the  real  test,  but  first  you  will  hear  a little  story,  to  help  you  get  used  to  the  voice. 
The  test  starts  immediately  after  the  story,  and  the  first  word  is  preceded  by  a beep.  Let  me 
know  when  to  start  the  tape. 


161 


How  to  answer 


You  will  hear  each  word  spoken  one  single  time,  then  you 
must  respond  quickly.  To  indicate  your  answer,  you  have  an 
numbered  sheet  with  the  two  possible  answers  by  each  number. 

Circle  the  letter  which  corresponds  to  the  sound  you  heard,  f 
you  make  a mistake,  put  a line  through  your  error  and  circle  the 
answer  you  finally  decide  on.  You  will  find  examples  opposite. 

To  avoid  gaps,  you  will  hear  a beep  just  before  the  numbers 
10,  20,  30,  ...  On  the  answer  sheet,  the  beep  is  indicated  by  a 
line  between  9 and  10, 19  and  20, ...  Use  the  beep  to  reposition 
yourself  if  there  is  a gap  in  your  answers. 


You 

You 

hear: 

mark: 

didite 

7)®T 

doutoute 

8)  D© 

ditite 

9)  D© 

BEEP 

dadate 

10)@T 

datate 

"P 

didite 

12)@T 

A little  practice 

Now  we  will  do  a little  sample  test  to  get  you  used  to  it,  but  using  the  words: 

bipip  bapap  boupoup 

bibip  babap  bouboup 

so  you  have  to  identify  either  a B or  a P,  and  circle  the  letter  corresponding  to  your  answer.  When 
you  are  ready,  let  me  know,  and  if  you  have  any  questions  you  can  ask  me  now. 


On  your  mark,  get  set,  ... 

The  voice  you  heard  is  the  one  from  the  real  test. 

Now  you  will  start  the  real  test,  but  first  you  will  hear  a little  story,  to  help  you  get  used  to  the  voice. 
The  test  starts  immediately  after  the  story,  and  the  first  word  is  preceded  by  a beep.  Let  me 
know  when  to  start  the  tape. 


162 


1) 

TD 

30) 

TD 

2) 

TD 

31) 

TD 

3) 

TD 

32) 

TD 

4) 

TD 

33) 

TD 

5) 

TD 

34) 

TD 

6) 

TD 

35) 

TD 

7) 

TD 

36) 

TD 

8) 

TD 

37) 

TD 

9) 

TD 

38) 

TD 

39) 

TD 

10) 

TD 

11) 

TD 

40) 

TD 

12) 

TD 

41) 

TD 

13) 

TD 

42) 

TD 

14) 

TD 

43) 

TD 

15) 

TD 

44) 

TD 

16) 

TD 

45) 

TD 

17) 

TD 

46) 

TD 

18) 

TD 

47) 

TD 

19) 

TD 

48) 

TD 

49) 

TD 

20) 

TD 

21) 

TD 

50) 

TD 

22) 

TD 

51) 

TD 

23) 

TD 

52) 

TD 

24) 

TD 

53) 

TD 

25) 

TD 

54) 

TD 

26) 

TD 

55) 

TD 

27) 

TD 

56) 

TD 

28) 

TD 

57) 

TD 

29) 

TD 

58) 

TD 

59) 

TD 

88) 

TD 

89) 

TD 

60) 

TD 

61) 

TD 

90) 

TD 

62) 

TD 

91) 

TD 

63) 

TD 

92) 

TD 

64) 

TD 

93) 

TD 

65) 

TD 

94) 

TD 

66) 

TD 

95) 

TD 

67) 

TD 

96) 

TD 

68) 

TD 

97) 

TD 

69) 

TD 

98) 

TD 

99) 

TD 

70) 

TD 

71) 

TD 

100) 

TD 

72) 

TD 

101) 

TD 

73) 

TD 

102) 

TD 

74) 

TD 

103) 

TD 

75) 

TD 

104) 

TD 

76) 

TD 

105) 

TD 

77) 

TD 

106) 

TD 

78) 

TD 

107) 

TD 

79) 

TD 

108) 

TD 

109) 

TD 

80) 

TD 

81) 

TD 

110) 

TD 

82) 

TD 

HI) 

TD 

83) 

TD 

112) 

TD 

84) 

TD 

113) 

TD 

85) 

TD 

114) 

TD 

86) 

TD 

115) 

TD 

87) 

TD 

116) 

TD 

117) 

TD 

146) 

TD 

118) 

TD 

147) 

TD 

119) 

TD 

148) 

TD 

149) 

TD 

120) 

TD 

— 

121) 

TD 

150) 

TD 

122) 

TD 

151) 

TD 

123) 

TD 

152) 

TD 

124) 

TD 

153) 

TD 

125) 

TD 

154) 

TD 

126) 

TD 

155) 

TD 

127) 

TD 

156) 

TD 

128) 

TD 

157) 

TD 

129) 

TD 

158) 

TD 

159) 

TD 

130) 

TD 

— 

131) 

TD 

160) 

TD 

132) 

TD 

161) 

TD 

133) 

TD 

162) 

TD 

134) 

TD 

163) 

TD 

135) 

TD 

164) 

TD 

136) 

TD 

165) 

TD 

137) 

TD 

166) 

TD 

138) 

TD 

167) 

TD 

139) 

TD 

168) 

TD 

169) 

TD 

140) 

TD 

141) 

TD 

170) 

TD 

142) 

TD 

171) 

TD 

143) 

TD 

172) 

TD 

144) 

TD 

173) 

TD 

145) 

TD 

174) 

TD 

163 


1) 

DT 

30) 

DT 

2) 

DT 

31) 

DT 

3) 

DT 

32) 

DT 

4) 

DT 

33) 

DT 

5) 

DT 

34) 

DT 

6) 

DT 

35) 

DT 

7) 

DT 

36) 

DT 

8) 

DT 

37) 

DT 

9) 

DT 

38) 

DT 

39) 

DT 

10) 

DT 

11) 

DT 

40) 

DT 

12) 

DT 

41) 

DT 

13) 

DT 

42) 

DT 

14) 

DT 

43) 

DT 

15) 

DT 

44) 

DT 

16) 

DT 

45) 

DT 

17) 

DT 

46) 

DT 

18) 

DT 

47) 

DT 

19) 

DT 

48) 

DT 

49) 

DT 

20) 

DT 

21) 

DT 

50) 

DT 

22) 

DT 

51) 

DT 

23) 

DT 

52) 

DT 

24) 

DT 

53) 

DT 

25) 

DT 

54) 

DT 

26) 

DT 

55) 

DT 

27) 

DT 

56) 

DT 

28) 

DT 

57) 

DT 

29) 

DT 

58) 

DT 

59) 

DT 

88) 

DT 

89) 

DT 

60) 

DT 

61) 

DT 

90) 

DT 

62) 

DT 

91) 

DT 

63) 

DT 

92) 

DT 

64) 

DT 

93) 

DT 

65) 

DT 

94) 

DT 

66) 

DT 

95) 

DT 

67) 

DT 

96) 

DT 

68) 

DT 

97) 

DT 

69) 

DT 

98) 

DT 

99) 

DT 

70) 

DT 

71) 

DT 

100) 

DT 

72) 

DT 

101) 

DT 

73) 

DT 

102) 

DT 

74) 

DT 

103) 

DT 

75) 

DT 

104) 

DT 

76) 

DT 

105) 

DT 

77) 

DT 

106) 

DT 

78) 

DT 

107) 

DT 

79) 

DT 

108) 

DT 

109) 

DT 

80) 

DT 

81) 

DT 

110) 

DT 

82) 

DT 

HI) 

DT 

83) 

DT 

112) 

DT 

84) 

DT 

113) 

DT 

85) 

DT 

114) 

DT 

86) 

DT 

115) 

DT 

87) 

DT 

116) 

DT 

117) 

DT 

146) 

DT 

118) 

DT 

147) 

DT 

119) 

DT 

148) 

DT 

149) 

DT 

120) 

DT 

121) 

DT 

150) 

DT 

122) 

DT 

151) 

DT 

123) 

DT 

152) 

DT 

124) 

DT 

153) 

DT 

125) 

DT 

154) 

DT 

126) 

DT 

155) 

DT 

127) 

DT 

156) 

DT 

128) 

DT 

157) 

DT 

129) 

DT 

158) 

DT 

159) 

DT 

130) 

DT 

— 

131) 

DT 

160) 

DT 

132) 

DT 

161) 

DT 

133) 

DT 

162) 

DT 

134) 

DT 

163) 

DT 

135) 

DT 

164) 

DT 

136) 

DT 

165) 

DT 

137) 

DT 

166) 

DT 

138) 

DT 

167) 

DT 

139) 

DT 

168) 

DT 

169) 

DT 

140) 

DT 

141) 

DT 

170) 

DT 

142) 

DT 

171) 

DT 

143) 

DT 

172) 

DT 

144) 

DT 

173) 

DT 

145) 

DT 

174) 

DT 

164 


0} 

E 

a.' 

c: 

M 

£Z 

LXJ 


OQCQCOCQQZICQCQCDCO 

Q-Q_Q-Q_Q_Q_Q_Q_Q_ 


□0  CO  □□ 

Q_  CL  Q_ 


2 

<1> 

=5 


O 

O 


43 

■d 


4> 


2 

d=* 

4> 

<_> 


'13 

o 

CO 

CS 

<7 

13 

C3S 

.8 

<_> 

IS 

d 


o 

o 


o 

> 

i_ 

o 

o. 

<u 

o 

O’ 

CO 

43 

'd> 

co 


2 

'43 

CL 


£ 

'43 


co 

=3 

o 

> 


£ 

o 

d 

'43 


■a? 

£ 

o 


a? 

■ ■ d 

o>  o 

CO  sz 
CO  Q_ 

£ ^ 
77  -4? 
< h- 


d 

2 

43 

CO 

43 

d 


L ■ 43 

<13  o> 

» §J 

43 

^.43  — 

go* 

— ■0  2 


43 

<-3 

d 

ctf 

co 

co 


OiiS  O 

C3J 

j — 

C M-T' 
= 4>  '<3? 

$ <p  CO 

"§  ■£  £=* 
-B  =3  O 

Z 

43 

77 

o 

CO 

CO 

£ 

C3S 

£ 

2 

LU  -C  l 

Q 

CL 

<13 

O 

CO 

43 

77 

=3 

43 

43 

d 

i_ 

£ 

Langues : 
(niveaux) 

oT 

£ d 
CS 

77  o 

_i — 1 

'43 

77 

C3j 

£ 

43 

rs 

■03 

> 

Z3 

CO 

<7  Or 

03 

d 

(3j 

h> 

o 

IS 

O C7 

iz 

_J 

< 

Q 

165 


CD 

£ 

QJ> 

C 

jS 

£Z 


Q_ 

CL 

CL 

CL 

CL 

CL 

CL 

CL 

LL 

CL 

CL 

CL 

CD 

CD 

CQ 

CD 

CQ 

CQ 

CQ 

CQ 

CQ 

CQ 

CQ 

CQ 

x--. 

, _ 

k 

CsJ 

co 

LT> 

cc» 

r-- 

co 

cr> 

CD 

— 

OvJ 

2 

d> 


c 

o 

<-> 

ds 


2 

'd> 

CL 


2 

d> 

.c 

<_> 

2 

£ 

O' 

<-> 

'13 

o 

CO 

<TS 

o 

3 

CCS 


Si 

'd> 


<-> 

JS 

c 

o 

o 

CO 

3 

o 

> 

1_ 

3 

o 

Cl 

ds 

3 

O' 

CO 

d> 

'd> 

CO 


CO 

3 

o 

> 


E 

o 

c 

'£ 

Q_ 

-i < 

d> 

E 

o 


d> 

• • c 

d>  o 

CO  .£= 

cp  o. 

E! 

"O  '<i> 

< h- 


2 

<u 

CO 

<l> 

c 

CO 

d? 

-<1> 

c 

c 

o 

"E 

o 

o 

o 

CO 

o 

> 


^ ■ <L> 

£ 5* 
c-  s= 

co  .fc, 

d> 

^_d>  - 

if 

?$ 
osi3  o 

$ g co 

B =3  o 

LU  < _1 


,1 


S2 


o> 

<L> 

c 

CCS 

CO 

CO 

CCS 


d> 

05 

.2 

CCS 

Q 


3 

o 

co 

CO 

2 

2 

CL 


dS 

d> 

£= 

L. 

2 

CCS 


CO 
d> 

05 
3 

.1 > 

-d> 

3 

ccS 
d> 

£ CCS 


O' 

3 

os 


co  'C' 

s 1 

CO 

-S 

3 

< 


ds 

<L> 

c 

^ uT 

.<&  CCS 

TJ  o 

r-s 

<J>  O" 

IS 

O 

Q 


166 


La  Bise  et  le  Soleil 

La  bise  et  le  soleil  se  disputaient,  chacun  assurant 
qu'il  etait  le  plus  fort,  quand  ils  ont  vu  un  voyageur 
qui  avangait,  enveloppe  dans  son  manteau.  Ils  sont 
tombes  d'accord,  que  celui  qui  arriverait  le  premier 
a faire  oter  son  manteau  au  voyageur,  serait  regarde 
comme  le  plus  fort.  Alors  la  bise  s'est  mise  a souffler 
de  toute  sa  force;  mais  plus  elle  soufflait,  plus  le 
voyageur  serrait  son  manteau  autour  de  lui,  et  a la 
fin  la  bise  a renonce  a le  lui  faire  oter.  Alors  le  soleil 
a commence  a briber,  et  au  bout  d'un  moment  le 
voyageur,  rechauffe,  a ote  son  manteau.  Ainsi  la  bise 
a du  reconnaitre  que  le  soleil  etait  le  plus  fort  des 
deux. 


Attention,  le  test  va  commencer.  Repondez  en 
respectant  les  instructions  deja  specifiees. 
Silence, ...  Top!  C'est  parti. 


(This  last  paragraph  translates  as:  "Attention,  the  test  is  starting. 
Answer  according  to  the  instructions  already  given.  Silence,  ...  Go!") 


APPENDIX  B 
RESPONSE  DATA 


The  following  six  tables  report  responses  to  each  stimulus  token  by 
subject  group  and  pooled  across  subject  groups  for  the  total.  Table  B.7  is  a key 
to  the  assignment  of  stimuli  within  the  factorial  arrays.  Together,  they 
provide  the  foundation  upon  which  the  data  tables  in  Appendix  C are  based. 


Table  B.l 

Responses  to  /ti/  stimuli. 


Gr0Up  Total NN  NF 


Response 

T 

D 

T 

D 

T 

D 

T 

D 

T 

D 

Stimulus 

ti0 1 d4 

121 

4 

48 

2 

26 

1 

24 

1 

23 

0 

t i 0 4d  4 

129 

0 

50 

0 

29 

0 

27 

0 

23 

0 

ti05a2 

129 

0 

5 1 

0 

29 

0 

27 

0 

22 

0 

t i 0 6 r 

1 26 

1 

53 

0 

27 

0 

26 

0 

20 

1 

ti07a2 

1 27 

0 

52 

0 

27 

0 

25 

0 

23 

0 

ti08a6 

1 27 

2 

49 

2 

29 

0 

27 

0 

22 

0 

til  0d2 

1 22 

3 

49 

1 

25 

2 

25 

0 

23 

0 

til  1 d6 

126 

0 

52 

0 

26 

0 

25 

0 

23 

0 

t i 1 2 r 

1 26 

3 

49 

1 

29 

0 

26 

1 

22 

1 

til  3d2 

1 25 

0 

50 

0 

27 

0 

25 

0 

23 

0 

til  4d 6 

1 22 

4 

50 

3 

27 

0 

26 

0 

1 9 

1 

til  5 d 2 

1 26 

1 

5 1 

1 

27 

0 

25 

0 

23 

0 

til  6r 

1 25 

0 

50 

0 

27 

0 

25 

0 

23 

0 

til  7a6 

125 

1 

52 

0 

25 

1 

25 

0 

23 

0 

t i 1 8 r 

1 28 

1 

50 

1 

29 

0 

27 

0 

22 

0 

til  9a2 

1 27 

1 

50 

0 

28 

1 

27 

0 

22 

0 

ti20a4 

1 25 

0 

50 

0 

27 

0 

25 

0 

23 

0 

t i 2 1 a6 

124 

1 

50 

0 

26 

1 

25 

0 

23 

0 

ti22a4 

123 

4 

50 

2 

25 

2 

25 

0 

23 

0 

t i 2 3 d 6 

1 1 9 

6 

46 

4 

26 

1 

24 

1 

23 

0 

ti24a4 

1 28 

2 

50 

1 

29 

0 

27 

0 

22 

1 

t i 2 5 r 

126 

0 

52 

0 

27 

0 

24 

0 

23 

0 

t i 2 6 d 4 

1 24 

3 

50 

0 

27 

1 

25 

2 

22 

0 

167 


Table  B.2 

Responses  to  /ta/  stimuli. 


Total NN NF LN  LF 


Response 

T 

D 

T 

D 

T 

D 

T 

D 

T 

D 

Stimulus 
taOl  r 

1 25 

1 

5 1 

1 

26 

0 

25 

0 

23 

0 

ta02a6 

1 1 8 

8 

46 

5 

27 

1 

22 

1 

23 

1 

ta03d4 

1 22 

2 

48 

1 

26 

1 

25 

0 

23 

0 

ta04r 

126 

2 

50 

0 

29 

0 

26 

1 

2 1 

1 

ta05d2 

1 23 

3 

50 

2 

27 

0 

24 

0 

22 

1 

ta06r 

1 24 

2 

50 

1 

27 

1 

23 

0 

24 

0 

ta07a4 

124 

3 

50 

2 

26 

1 

25 

0 

23 

0 

ta08r 

123 

3 

5 1 

0 

27 

1 

22 

1 

23 

1 

ta09d6 

1 08 

1 7 

41 

9 

23 

4 

2 1 

4 

23 

0 

ta  1 0a6 

1 23 

2 

49 

2 

28 

0 

23 

0 

23 

0 

tal  1 r 

127 

0 

53 

0 

27 

0 

26 

0 

2 1 

0 

ta12a6 

1 1 9 

6 

48 

2 

27 

1 

22 

1 

22 

2 

tal  3d6 

1 26 

1 

52 

1 

27 

0 

26 

0 

2 1 

0 

ta  1 4r 

127 

2 

49 

2 

29 

0 

27 

0 

22 

0 

tal  5d6 

1 1 9 

9 

46 

4 

27 

2 

25 

2 

2 1 

1 

tal  6d4 

1 23 

3 

49 

2 

27 

1 

23 

0 

24 

0 

tal  7r 

1 25 

2 

5 1 

2 

27 

0 

26 

0 

2 1 

0 

tal  8a2 

1 26 

0 

5 1 

0 

28 

0 

23 

0 

24 

0 

tal  9a4 

1 25 

2 

50 

2 

27 

0 

25 

0 

23 

0 

ta20a2 

1 26 

0 

52 

0 

26 

0 

25 

0 

23 

0 

ta21d4 

1 22 

4 

50 

2 

25 

2 

26 

0 

2 1 

0 

ta22a4 

125 

4 

49 

1 

28 

1 

25 

2 

23 

0 

ta23d2 

1 23 

2 

49 

2 

28 

0 

23 

0 

23 

0 

ta24d2 

1 25 

1 

5 1 

0 

28 

0 

22 

1 

24 

0 

ta25a2 

123 

3 

49 

3 

26 

0 

25 

0 

23 

0 

169 


Table  B.3 

Responses  to  /tu/  stimuli. 


Gr0UP  Total  NN 


Response 

T 

D 

T 

D 

T 

D 

T 

D 

T 

D 

Stimulus 

tuOl d6 

1 1 8 

7 

45 

5 

26 

1 

25 

0 

22 

1 

tu02d4 

1 24 

2 

49 

2 

28 

0 

23 

0 

24 

0 

tu03r 

1 27 

2 

50 

1 

29 

0 

26 

1 

22 

0 

tu04a4 

1 22 

3 

48 

2 

26 

1 

25 

0 

23 

0 

tu05d4 

123 

3 

5 1 

0 

27 

1 

23 

0 

22 

2 

tu06d2 

1 24 

2 

49 

2 

28 

0 

23 

0 

24 

0 

tu07a2 

1 28 

1 

50 

1 

29 

0 

27 

0 

22 

0 

tu08d4 

1 1 9 

6 

49 

1 

24 

3 

23 

2 

23 

0 

tu09d2 

1 24 

2 

52 

1 

26 

0 

25 

1 

2 1 

0 

tu  1 Or 

1 23 

0 

50 

0 

26 

0 

25 

0 

22 

0 

tu  1 1 a6 

1 24 

1 

50 

0 

26 

1 

25 

0 

23 

0 

tu  1 2a4 

1 22 

2 

49 

1 

27 

0 

22 

1 

24 

0 

t u 1 3 r 

1 27 

1 

50 

0 

29 

0 

26 

1 

22 

0 

tu  1 4d2 

125 

1 

5 1 

0 

27 

1 

23 

0 

24 

0 

tu  1 5a6 

1 22 

3 

49 

1 

26 

1 

24 

1 

23 

0 

t u 1 7 r 

1 26 

1 

53 

0 

27 

0 

25 

1 

2 1 

0 

t u 1 8 r 

125 

2 

52 

1 

26 

1 

26 

0 

2 1 

0 

t u 1 9d6 

1 08 

1 9 

42 

1 1 

26 

1 

22 

4 

1 8 

3 

tu20a4 

1 26 

3 

49 

2 

29 

0 

26 

1 

22 

0 

tu21a2 

1 25 

1 

50 

1 

28 

0 

23 

0 

24 

0 

tu22d6 

1 1 8 

5 

46 

3 

25 

1 

24 

1 

23 

0 

tu23a6 

1 20 

5 

46 

2 

27 

1 

26 

1 

2 1 

1 

tu24a2 

1 25 

0 

50 

0 

27 

0 

25 

0 

23 

0 

tu25r 

127 

0 

53 

0 

27 

0 

26 

0 

2 1 

0 

Table  B.4 

Responses  to  /di/  stimuli. 


170 


Gr0Up Total  NN 


Response 

T 

D 

T 

D 

T 

D 

T 

D 

T 

D 

Stimulus 

diOl  a2 

1 

1 25 

0 

5 1 

1 

27 

0 

23 

0 

24 

di02a2 

1 

1 27 

1 

49 

0 

29 

0 

27 

0 

22 

d i 0 3 r 

3 

1 24 

0 

52 

2 

25 

1 

24 

0 

23 

di04a4 

4 

122 

2 

50 

0 

26 

0 

25 

2 

2 1 

d i 0 5 r 

3 

1 22 

0 

5 1 

2 

25 

0 

23 

1 

23 

di06d4 

1 

1 25 

1 

5 1 

0 

26 

0 

25 

0 

23 

di07a6 

20 

1 05 

7 

43 

1 1 

1 6 

1 

24 

1 

22 

di08d4 

2 

1 26 

1 

49 

1 

28 

0 

27 

0 

22 

di09a6 

7 

1 22 

3 

48 

3 

26 

0 

27 

1 

2 1 

d i 1 0d6 

3 

122 

1 

49 

0 

27 

1 

24 

1 

22 

dill  a4 

0 

129 

0 

5 1 

0 

29 

0 

27 

0 

22 

di  1 2d6 

5 

1 20 

0 

50 

2 

25 

2 

23 

1 

22 

di  1 3d2 

3 

1 26 

1 

50 

1 

28 

1 

26 

0 

22 

d i 1 4 r 

4 

1 22 

0 

53 

3 

24 

0 

26 

1 

1 9 

d i 1 5a6 

58 

69 

25 

27 

1 5 

1 2 

7 

1 8 

1 1 

1 2 

d i 1 6 r 

8 

1 1 9 

3 

50 

0 

27 

3 

23 

2 

1 9 

d i 1 7d  4 

1 1 

1 1 6 

5 

48 

2 

25 

2 

24 

2 

1 9 

di  1 8d2 

4 

1 23 

1 

52 

2 

25 

1 

25 

0 

2 1 

d i 1 9d6 

4 

123 

0 

52 

3 

24 

0 

25 

1 

22 

di20a2 

4 

121 

1 

49 

2 

25 

0 

25 

1 

22 

d i 2 1 r 

6 

1 1 7 

3 

47 

2 

25 

0 

23 

1 

22 

di22a4 

35 

9 1 

1 4 

38 

8 

1 8 

7 

1 8 

6 

1 7 

di23d2 

7 

1 1 9 

4 

47 

1 

27 

0 

23 

2 

22 

d i 2 4 r 

0 

129 

0 

5 1 

0 

29 

0 

27 

0 

22 

d i 2 5 r 

1 

1 25 

1 

50 

0 

28 

0 

23 

0 

24 

171 


Table  B.5 

Responses  to  /da/  stimuli. 


Total NN NF  LN  LF 


Response 

T 

D 

T 

D 

T 

D 

T 

D 

T 

D 

Stimulus 
daOl  r 

2 

1 24 

2 

5 1 

0 

26 

0 

26 

0 

2 1 

da02a4 

1 

1 26 

0 

53 

1 

26 

0 

26 

0 

2 1 

da03d2 

1 

126 

0 

53 

1 

26 

0 

26 

0 

2 1 

da06d4 

0 

1 27 

0 

53 

0 

27 

0 

26 

0 

2 1 

da07a2 

2 

1 25 

1 

52 

1 

26 

0 

26 

0 

2 1 

da08a6 

2 

1 23 

2 

48 

0 

27 

0 

25 

0 

23 

da09a2 

3 

1 24 

2 

50 

0 

27 

1 

24 

0 

23 

da10d2 

0 

127 

0 

52 

0 

27 

0 

25 

0 

23 

dal 1d4 

5 

1 22 

2 

50 

2 

25 

0 

25 

1 

22 

dal  2r 

0 

129 

0 

5 1 

0 

29 

0 

27 

0 

22 

da13d6 

5 

1 20 

2 

48 

2 

25 

0 

25 

1 

22 

da14a4 

0 

127 

0 

52 

0 

27 

0 

25 

0 

23 

dal 5d6 

1 

1 25 

0 

52 

1 

25 

0 

25 

0 

23 

dal  6r 

1 

1 24 

1 

49 

0 

27 

0 

25 

0 

23 

dal 7a4 

1 

1 24 

0 

50 

1 

26 

0 

25 

0 

23 

dal  8r 

0 

1 29 

0 

5 1 

0 

29 

0 

27 

0 

22 

dal 9d2 

0 

1 27 

0 

53 

0 

27 

0 

26 

0 

2 1 

da20r 

3 

1 24 

3 

50 

0 

27 

0 

26 

0 

2 1 

da21d6 

2 

1 27 

0 

5 1 

1 

28 

0 

27 

1 

2 1 

da22a6 

3 

123 

2 

49 

1 

27 

0 

23 

0 

24 

da23a2 

1 

126 

0 

53 

0 

27 

1 

25 

0 

2 1 

da24a6 

2 

1 25 

1 

52 

1 

26 

0 

26 

0 

2 1 

da25d4 

4 

1 22 

1 

50 

1 

27 

0 

23 

2 

22 

172 


Table  B.6 

Responses  to  /du/  stimuli. 


Total NN NF LN  LF 


Response 

T 

D 

T 

D 

T 

D 

T 

D 

T 

D 

Stimulus 
duOl  r 

3 

1 24 

1 

52 

0 

27 

1 

25 

1 

20 

du02a2 

1 

123 

1 

49 

0 

26 

0 

25 

0 

23 

du03d4 

3 

1 22 

1 

50 

2 

25 

0 

25 

0 

22 

du04d6 

5 

1 23 

2 

48 

1 

28 

1 

26 

1 

2 1 

du05a2 

2 

124 

2 

49 

0 

28 

0 

23 

0 

24 

du06d2 

3 

123 

1 

50 

1 

27 

0 

23 

1 

23 

du07r 

1 

1 25 

0 

53 

1 

26 

0 

25 

0 

2 1 

du08r 

0 

125 

0 

50 

0 

27 

0 

25 

0 

23 

du09d4 

1 

1 24 

0 

50 

1 

26 

0 

25 

0 

23 

du10d4 

0 

1 26 

0 

52 

0 

26 

0 

25 

0 

23 

dul 1 a6 

4 

1 22 

2 

50 

0 

26 

0 

25 

2 

2 1 

dul 2d6 

1 

122 

1 

49 

0 

27 

0 

23 

0 

23 

du  1 3r 

0 

1 25 

0 

50 

0 

27 

0 

25 

0 

23 

dul 4d2 

0 

1 26 

0 

52 

0 

26 

0 

25 

0 

23 

du15a4 

3 

1 23 

3 

49 

0 

27 

0 

24 

0 

23 

du  1 6r 

4 

1 22 

4 

47 

0 

28 

0 

23 

0 

24 

dul 7a4 

4 

1 24 

2 

48 

1 

28 

1 

26 

0 

22 

dul 8a6 

0 

1 28 

0 

50 

0 

29 

0 

27 

0 

22 

dul 9d2 

2 

124 

1 

50 

1 

27 

0 

23 

0 

24 

du20r 

3 

121 

2 

49 

1 

26 

0 

23 

0 

23 

du21a4 

2 

1 25 

1 

52 

1 

26 

0 

26 

0 

2 1 

du22a2 

1 

126 

0 

53 

1 

26 

0 

26 

0 

2 1 

du23r 

1 

1 26 

0 

53 

1 

26 

0 

26 

0 

2 1 

du24d6 

2 

1 25 

0 

52 

2 

25 

0 

25 

0 

23 

du25a6 

1 

125 

1 

50 

0 

28 

0 

23 

0 

24 

173 


Table  B.7 

Stimulus  Assignments  for  Cue  Factorial  Arrays. 


DAmp 


42 

44 

46 

High 

da23a2 

da14a4 

da08a6 

d i 0 1 a2 

di04a4 

di07a6 

du02a2 

du21 a4 

du25a6 

Med 

da07a2 

da02a4 

da24a6 

di20a2 

d i 1 1 a4 

d i 1 5a6 

du22a2 

dul 7a4 

dul 1a6 

Low 

da09a2 

da17a4 

da22a6 

di02a2 

di22a4 

di09a6 

du05a2 

du15a4 

du18a6 

Raw 

daOl  r 

dal  6r 

d i 2 1 r 

d i 1 4 r 

du07r 

du20r 

DDur 

42 

44 

4£ 

High 

da03d2 

dal 1d4 

da13d6 

d i 1 8d2 

di06d4 

dil  2d6 

dul 9d2 

du03d4 

du24d6 

Med 

da10d2 

da06d4 

da15d6 

d i 1 3d2 

d i 1 7d4 

dil  0d6 

du06d2 

du09d4 

dul 2d6 

Low 

dal 9d2 

da25d4 

da21d6 

di23d2 

di08d4 

dil  9d6 

dul 4d2 

dul 0d4 

du04d6 

Raw 

da20r 

daOl  r 

d i 2 4 r 

d i 1 6 r 

du  1 3r 

duOl  r 

Extra  fdls 

dal  2r 

d i 0 3 r du08r 

dal  8r 

d i 0 5 r du16r 

d i 2 5 r du23r 

TAmp 


42 

44 

46 

High 

ta25a2 

ta07a4 

ta02a6 

ti07a2 

ti24a4 

til  7a6 

tu21a2 

tu04a4 

tu23a6 

Med 

tal  8a2 

tal 9a4 

tal  0a6 

ti05a2 

ti20a4 

t i 2 1 a 6 

tu24a2 

tu20a4 

tu  1 5a6 

Low 

ta20a2 

ta22a4 

tal 2a6 

til  9a2 

ti22a4 

ti08a6 

tu07a2 

tu  1 2a4 

tu  1 1 a6 

Raw 

tal  4r 

tal  7r 

til  8r 

t i 1 6 r 

tu03r 

t u 1 Or 

TDur 

42 

44 

46 

High  ta05d2 

tal  6d4 

tal  3d6 

til  5 d 2 

t i 0 4 d 4 

til  1 d6 

tu09d2 

tu05d4 

tu22d6 

Med 

ta23d2 

ta03d4 

tal 5d6 

til  0d2 

t i 2 6 d 4 

til  4 d 6 

tu06d2 

tu08d4 

tuOl  d6 

Low 

ta24d2 

ta21d4 

ta09d6 

til  3 d 2 

ti0 1 d4 

t i 2 3 d 6 

t u 1 4d2 

tu02d4 

tu  1 9d6 

Raw 

ta06r 

tal  1 r 

t i 0 6 r 

t i 1 2 r 

tu25r 

t u 1 8 r 

Extra  [tls 


taOl  r 

t i 2 5 r t u 1 3r 

ta04r 

tu  1 7r 

ta08r 

APPENDIX  C 
DATA  TABLES 


The  tables  in  this  appendix  contain  all  the  detailed  data  and  results  as 
analyzed  in  this  study.  The  layout  of  the  tables  is  regular  but  complex,  and 
the  major  findings  are  presented  in  a more  approachable  graphic  form  in 
Chapter  7.  These  tables  document  the  numerical  results. 

Sequencing  across  the  Tables 

The  full  data  are  presented  in  3 separate  analysis  sections,  by  Group,  by 
Group  Pool,  and  by  Vowel.  The  Group  section  (8  pages)  separates  the  data 
according  to  the  four  basic  subject  groups,  i.e.: 

Naive  Natives  NN 

Linguist  Natives  LN 
Naive  Foreign  NF 

Linguist  Foreign  LF 

and  presented  in  that  order.  The  Group  Pool  section  (4  pages)  eliminates  the 
distinction  between  Naives  and  Linguists,  so  it  categorizes  the  data  as  All 
Natives  and  All  Foreign.  Both  the  Group  and  Group  Pool  analyses  pool 
across  the  three  vowels.  The  Vowel  section  pools  across  all  four  subject 
groups,  and  shows  the  data  for  the  three  vowels,  /i/,  /a/,  /u/,  then  all  three 
pooled  as  All  Vs. 

Since  the  All  V analysis  pools  across  both  vowels  and  subject  groups,  it 
in  a sense  represents  the  overall  findings  about  the  four  cues,  but  it  must  be 
remembered  that  it  also  masks  important  differences  among  vowels  and 


174 


175 


subject  groups.  Unfortunately,  the  very  low  overall  error  rate  makes  it  futile 
to  attempt  an  analysis  splitting  out  vowels  and  subject  groups  at  the  same 
time.  The  ceiling  effects  would  make  too  many  comparisons  impossible. 

The  data  for  the  four  cues  is  indicated  as: 


/d/ voicing  amplitude  D-Amp 

/d/  hold  duration  D-Dur 

/ 1/  burst  amplitude  T -Amp 

/ 1/  hold  duration  T-Dur 


and  is  presented  in  that  order.  It  is  within  the  cue  that  the  larger  analysis 
sections  are  broken  down  into  their  respective  parts.  Thus,  within  the  Group 
section,  D-Amp  data  are  presented  for  NN,  LN,  NF,  and  LF,  and  then  the 
presentation  moves  on  the  D-Dur  data.  The  analogous  situation  obtains  for 
All  Native  & All  Foreign  within  Group  Pool,  and  for  /i/,  /a/,  /u/,  and  All  Vs 
within  the  Vowel  section. 

Each  page  contains  two  basic  analysis  tables,  one  on  the  left  and  one  on 
the  right.  There  are  labels  at  the  top  left  of  each  page  indicating  the  section 
and  the  cue,  and  labels  at  the  top  of  both  tables  indicating  the  data  therein. 

For  example,  the  first  page  is  labeled  "by  Group"  and  "D-Amp"  at  the  top  left, 
and  the  tables  are  labeled  "NN"  on  the  left  and  "LN"  on  the  right. 

Note  that  column  labels  ("2,  4,  6,  Pool")  are  repeated  on  the  left  and 
right  tables,  but  that  the  row  labels  ("High,  Med,  ...")  are  only  given  once,  at 
the  far  left  side  of  the  page,  though  they  apply  to  both  tables. 

Layout  of  Information  Within  the  Tables 

Each  table  consists  of  a top  and  a bottom  part,  with  the  top  presenting 
the  numbers  and  proportions  of  responses,  and  the  bottom  using  those 
proportions  for  the  calculation  of  d'  values  and  confidence  intervals  on  d'. 


176 


Response  Proportions 

The  top  of  each  table  consists  of  four  columns,  labeled  "2,  4,  6,  Pool", 
and  five  "meta-rows"  labeled  "High,  Med,  Low,  Pool,  Bounds".  Each  meta- 
row groups  4 (normal)  rows  of  information.  Each  intersection  of  a column 
and  a meta-row  thus  constitutes  a cell  containing  4 pieces  of  information. 

The  3x3  grid  of  cells  in  the  top  left  of  the  upper  table  reports  responses 
to  the  3x3  factorial  array  of  the  stimulus  design,  with  2,  4,  and  6 standard 
deviations  of  contrast  reduction  versus  High,  Medium  and  Low  original 
contrast.  The  fourth  column  and  fourth  row,  labeled  "Pool,"  accumulate 
across  the  basic  3x3  grid,  thus  eliminating  the  other  factor. 

The  cells  in  the  2 and  4 columns  of  the  Bounds  meta-row  reports  the  2 
boundary  tokens'  responses,  and  the  cell  in  the  Pool  column  combines  them. 

Each  cell  contains  four  pieces  of  information.  At  the  top  of  the  cell  (in 
the  row  containing  the  meta-row  label)  is  the  number  of  error  (target) 
responses  to  the  stimuli  for  that  cell.  Below  that  is  the  total  number  of 
responses,  i.e.,  error  plus  correct  (target  plus  base).  Below  that,  in  turn,  is  the 
error  rate  (errors  divided  by  total  responses)  calculated  from  the  previous  two 
numbers. 

The  last  datum  is  either  a number  (typically  less  than  0.04)  or  the 
notation  "NV".  This  datum  concerns  the  difficulty  of  doing  a simple 
statistical  test  of  the  among  cells.  (See  Chapter  7 for  discussion.)  The 
statistical  literature  contains  several  different  criteria  concerning  the  validity 
of  constructing  confidence  intervals  on  proportions  such  as  the  error  rates 
used  in  this  research.  Such  confidence  intervals  would  be  one  simple  way  to 
check  for  statistically  significant  differences  between  the  error  rates  at  various 
levels  of  the  two  factors.  The  most  liberal  criterion  discovered  was  from 


177 


Mendenhall  & Beaver  (1991),  "The  interval  p±2o-  must  be  contained  in  the 
interval  0 to  1.  The  extremely  low  error  rates  in  this  study  make  even  this 
criterion  a problem.  The  NV  notation  in  this  row  indicates  that  the  data  for 
the  cell  do  not  meet  the  criterion,  and  it  would  not  be  valid  to  construct  a 

confidence  interval  around  the  cell's  error  rate.  Where  a number  is  shown,  it 
is  the  calculated  value  of  p — 2<j.  . 

d'  Values 


The  bottom  part  of  each  table  deals  with  the  calculation  of  d'  values, 
and  is  divided  into  three  parts,  labeled  "REF",  "MODS",  and  "ORIGIN". 
MODS  and  ORIGIN  treat  the  two  factors  organizing  the  edited  stimuli, 
respectively  contrast  reduction  and  original  contrast.  REF  treats  the  unedited 
boundary  tokens,  which  provide  the  reference  point  from  which  the  d1 
distances  will  be  calculated  on  the  two  factors. 

Under  the  REF  label  is  the  label  "FA",  as  a reminder  that  the  raw 
boundary  tokens  function  as  noise  stimuli,  providing  the  false  alarm  rate. 
Under  both  MODS  and  ORIGIN  is  the  label  "H",  as  a reminder  that  the 


stimuli  of  the  factorial  array  function  as  signal  stimuli  and  provide  hit  rates. 

Accompanying  FA  and  H are  the  labels  "z"  and  "var".  These  indicate 
the  rows  containing,  for  "z",  the  z-scores  of  the  corresponding  error  rates 
from  the  top  part  of  the  table,  and  for  "var",  statistical  variance  of  that  z-score, 
according  to  the  formula  (Gourevitch  & Galanter,  1967,  reported  in 
Macmillan  & Creelman,  1991,  p.  271): 

var[z(f,)i=4S 


where 


<t>(p)  = (2/r)~I/2  exp[-0.5c(p)2]. 


178 


The  z and  var  statistics  for  REF  are  in  the  fourth  ("Pool")  column, 
immediately  below  the  pooled  boundary  token  error  rate  data  from  which 
they  were  calculated.  The  MODS  z and  var  are  in  the  first  3 columns,  and 
they  were  calculated  from  the  error  rate  data  in  the  "Pool"  meta-row 
immediately  above  them,  so  they  represent  the  A2,  A4,  and  A6  levels  of 
contrast  reduction.  The  ORIGIN  z and  var  are  in  the  first  3 columns,  but  they 
were  calculated  from  the  error  rate  data  in  the  first  3 meta-rows  of  the  column 
four,  the  "Pool"  column,  so  they  represent  the  High,  Med,  and  Low  levels  of 
contrast  reduction. 

Below  the  z and  var  in  the  MODS  and  ORIGIN  are  four  rows  of  the  d' 
sensitivity  figures  which  are  the  ultimate  goal  of  this  project.  Since  a d'  is 
calculated  by  subtracting  the  z-score  of  a false  alarm  rate  from  that  of  a hit  rate, 
there  is  a d'  value  corresponding  to  each  z in  MODS  and  ORIGIN.  They  are 
calculated  by  subtracting  the  REF  z from  each  of  the  MODS  and  ORIGIN  z 
values  in  turn.  As  do  the  z-scores  above  them,  they  represent  the  A2,  A4,  and 
A6  levels  of  contrast  reduction  (for  MODS)  and  the  High,  Med,  and  Low  levels 
of  contrast  reduction  (for  ORIGIN). 

The  d'  values  as  calculated  are  point  estimates.  The  three  rows  under 
the  d'  row  deal  with  confidence  intervals  on  the  true  value  of  d'  (again,  cf. 
Gourevitch  & Galanter,  1967,  via  Macmillan  & Creelman,  1991).  The  first  of 
the  three  rows,  labeled  "var"  is  the  variance  of  the  d’,  calculated  as  the  sum  of 
the  variances  of  the  two  component  z-score  terms.  The  remaining  two  rows 
are  the  left  and  right  endpoints  of  the  95%  confidence  interval.  The 
confidence  intervals  on  d'  are  calculated  as: 

d ±(zal2)(yaixn) 

where  (z„/2)  equals  1.96  for  a 95%  confidence  interval. 


179 


by 

GROUP 

D-Amp 

NN 

2 

4 

6 

Pool 

High 

1 

3 

10 

14 

154 

157 

151 

462 

0.0065 

0.0191 

0.0662 

0.0303 

NV 

NV 

0.0202 

0.008 

Med 

2 

2 

28 

32 

156 

154 

157 

467 

0.0128 

0.013 

0.1783 

0.0685 

NV 

NV 

0.0306 

0.0117 

Low 

5 

17 

5 

27 

153 

154 

152 

459 

0.0327 

0.1104 

0.0329 

0.0588 

0.0144 

0.0253 

0.0145 

0.011 

Pool 

8 

22 

43 

73 

463 

465 

460 

1388 

0.0173 

0.0473 

0.0935 

0.0526 

0.0061 

0.0098 

0.0136 

0.006 

Bounds 

5 

3 

8 

156 

154 

310 

0.0321 

0.0195 

0.0258 

0.0141 

NV 

0.009 

REF 

FA  z 

-1.946 

var 

0.0225 

MODS 

H z 

-2.114 

-1.671 

-1.32 

var 

0.0201 

0.01 

0.0066 

d' 

-0.167 

0.2748 

0.6267 

var 

0.0426 

0.0325 

0.0291 

95%L 

-0.572 

-0.078 

0.2923 

95%R 

0.2373 

0.628 

0.9612 

ORIGIN 

H z 

-1.876 

-1.487 

-1.565 

var 

0.0135 

0.0078 

0.0088 

d’ 

0.07 

0.4595 

0.3816 

var 

0.036 

0.0303 

0.0313 

95%L 

-0.302 

0.118 

0.035 

95%R 

0.442 

0.8009 

0.7283 

2 

LN 

4 

6 

Pool 

1 

0 

1 

2 

74 

76 

73 

223 

0.0135 

0 

0.0137 

0.009 

NV 

NV 

NV 

NV 

0 

1 

7 

8 

77 

80 

76 

233 

0 

0.0125 

0.0921 

0.0343 

NV 

NV 

0.0332 

0.0119 

1 

7 

0 

8 

75 

74 

77 

226 

0.0133 

0.0946 

0 

0.0354 

NV 

0.034 

NV 

0.0123 

2 

8 

8 

18 

226 

230 

226 

682 

0.0088 

0.0348 

0.0354 

0.0264 

NV 

0.0121 

0.0123 

0.0061 

0 

0 

0.5 

73 

74 

147 

0 

0 

0.0034 

NV 

NV 

NV 

-2.706 

0.2198 

-2.372 

-1.815 

-1.807 

0.0677 

0.0247 

0.0248 

0.3345 

0.8916 

0.8996 

0.2874 

0.2445 

0.2446 

-0.716 

-0.077 

-0.07 

1.3853 

1.8608 

1.869 

-2.367 

-1.821 

-1.807 

0.0679 

0.0246 

0.0248 

0.3395 

0.8858 

0.8996 

0.2877 

0.2444 

0.2446 

-0.712 

-0.083 

-0.07 

1.3907 

1.8547 

1.869 

180 


by 

GROUP 

D-Amp 

NF 

2 

4 

6 

Pool 

High 

1 

1 

11 

13 

81 

80 

82 

243 

0.0123 

0.0125 

0.1341 

0.0535 

NV 

NV 

0.0376 

0.0144 

Med 

4 

2 

16 

22 

81 

85 

80 

246 

0.0494 

0.0235 

0.2 

0.0894 

0.0241 

NV 

0.0447 

0.0182 

Low 

0 

9 

4 

13 

84 

80 

86 

250 

0 

0.1125 

0.0465 

0.052 

NV 

0.0353 

0.0227 

0.014 

Pool 

5 

12 

31 

48 

246 

245 

248 

739 

0.0203 

0.049 

0.125 

0.065 

0.009 

0.0138 

0.021 

0.0091 

Bounds 

3 

4 

7 

80 

81 

161 

0.0375 

0.0494 

0.0435 

NV 

0.0241 

0.0161 

REF 

FA  z 

-1.712 

var 

0.0304 

MODS 

H z 

-2.047 

-1.655 

-1.15 

var 

0.0336 

0.0185 

0.0104 

d' 

-0.335 

0.0568 

0.5613 

var 

0.064 

0.0489 

0.0408 

95%L 

-0.831 

-0.376 

0.1654 

95%R 

0.1604 

0.4901 

0.9572 

ORIGIN 

H z 

-1.612 

-1.344 

-1.626 

var 

0.0176 

0.0127 

0.0174 

d’ 

0.0998 

0.3674 

0.0859 

var 

0.048 

0.0431 

0.0478 

95%L 

-0.33 

-0.039 

-0.343 

95%R 

0.5292 

0.7741 

0.5145 

2 

LF 

4 

6 

Pool 

0 

2 

1 

3 

68 

67 

70 

205 

0 

0.0299 

0.0143 

0.0146 

NV 

NV 

NV 

NV 

1 

0 

13 

14 

65 

65 

67 

197 

0.0154 

0 

0.194 

0.0711 

NV 

NV 

0.0483 

0.0183 

0 

6 

1 

7 

69 

69 

68 

206 

0 

0.087 

0.0147 

0.034 

NV 

0.0339 

NV 

0.0126 

1 

8 

15 

24 

202 

201 

205 

608 

0.005 

0.0398 

0.0732 

0.0395 

NV 

0.0138 

0.0182 

0.0079 

1 

1 

2 

65 

66 

131 

0.0154 

0.0152 

0.0153 

NV 

NV 

NV 

-2.579 

-1.753 

-1.453 

-2.163 

0.0776 

0.1187 

0.0258 

0.0171 

-0.416 

0.4101 

0.7105 

0.1963 

0.1034 

0.0948 

-1.285 

-0.22 

0.1071 

0.4523 

1.0405 

1.3139 

-2.18 

-1.468 

-1.825 

0.0512 

0.0182 

0.028 

-0.017 

0.6952 

0.3378 

0.1288 

0.0958 

0.1056 

-0.72 

0.0886 

-0.299 

0.6867 

1.3018 

0.9749 

181 


by 

GROUP 

D-Dur 

NN 

LN 

2 

4 

6 

Pool 

2 

4 

6 

Pool 

High 

2 

4 

2 

8 

1 

0 

2 

3 

157 

155 

152 

464 

75 

75 

75 

225 

0.0127 

0.0258 

0.0132 

0.0172 

0.0133 

0 

0.0267 

0.0133 

NV 

0.0127 

NV 

0.006 

NV 

NV 

NV 

NV 

Med 

2 

5 

2 

9 

1 

2 

1 

4 

154 

156 

152 

462 

75 

77 

73 

225 

0.013 

0.0321 

0.0132 

0.0195 

0.0133 

0.026 

0.0137 

0.0178 

NV 

0.0141 

NV 

0.0064 

NV 

NV 

NV 

0.0088 

Low 

4 

2 

2 

8 

0 

0 

1 

1 

156 

153 

153 

462 

74 

75 

79 

228 

0.0256 

0.0131 

0.0131 

0.0173 

0 

0 

0.0127 

0.0044 

0.0127 

NV 

NV 

0.0061 

NV 

NV 

NV 

NV 

Pool 

8 

11 

6 

25 

2 

2 

4 

8 

467 

464 

457 

1388 

224 

227 

227 

678 

0.0171 

0.0237 

0.0131 

0.018 

0.0089 

0.0088 

0.0176 

0.0118 

0.006 

0.0071 

0.0053 

0.0036 

NV 

NV 

0.0087 

0.0041 

Bounds 

3 

6 

9 

0 

4 

4 

154 

159 

313 

78 

78 

156 

0.0195 

0.0377 

0.0288 

0 

0.0513 

0.0256 

NV 

0.0151 

0.0094 

NV 

0.025 

0.0127 

REF 

FA  z 

-1.899 

-1.949 

var 

0.0207 

0.0449 

MODS 

H z 

-2.117 

-1.983 

-2.222 

-2.369 

-2.373 

-2.106 

var 

0.02 

0.016 

0.0249 

0.0678 

0.0676 

0.0404 

d' 

-0.218 

-0.083 

-0.323 

-0.419 

-0.424 

-0.156 

var 

0.0407 

0.0366 

0.0456 

0.1127 

0.1125 

0.0853 

95%L 

-0.613 

-0.458 

-0.741 

-1.078 

-1.082 

-0.729 

95  %R 

0.1779 

0.292 

0.0954 

0.2387 

0.2331 

0.416 

ORIGIN 

H z 

-2.114 

-2.065 

-2.113 

-2.216 

-2.102 

-2.621 

var 

0.0201 

0.0184 

0.0201 

0.0499 

0.0404 

0.1157 

d' 

-0.215 

-0.165 

-0.213 

-0.267 

-0.153 

-0.672 

var 

0.0407 

0.0391 

0.0408 

0.0949 

0.0854 

0.1607 

95%L 

-0.611 

-0.553 

-0.609 

-0.871 

-0.726 

-1.457 

95%R 

0.1806 

0.2225 

0.1825 

0.3365 

0.4199 

0.1139 

182 


by 

GROUP 

D-Dur 

NF 

LF 

2 

4 

6 

Pool 

2 

4 

6 

Pool 

High 

4 

4 

6 

14 

0 

1 

2 

3 

82 

80 

81 

243 

66 

68 

69 

203 

0.0488 

0.05 

0.0741 

0.0576 

0 

0.0147 

0.029 

0.0148 

0.0238 

0.0244 

0.0291 

0.0149 

NV 

NV 

NV 

NV 

Med 

2 

3 

1 

6 

1 

2 

1 

4 

84 

81 

80 

245 

69 

65 

69 

203 

0.0238 

0.037 

0.0125 

0.0245 

0.0145 

0.0308 

0.0145 

0.0197 

NV 

NV 

NV 

0.0099 

NV 

NV 

NV 

0.0098 

Low 

1 

2 

5 

8 

2 

2 

3 

7 

81 

83 

85 

249 

68 

69 

67 

204 

0.0123 

0.0241 

0.0588 

0.0321 

0.0294 

0.029 

0.0448 

0.0343 

NV 

NV 

0.0255 

0.0112 

NV 

NV 

NV 

0.0127 

Pool 

7 

9 

12 

28 

3 

5 

6 

14 

247 

244 

246 

737 

203 

202 

205 

610 

0.0283 

0.0369 

0.0488 

0.038 

0.0148 

0.0248 

0.0293 

0.023 

0.0106 

0.0121 

0.0137 

0.007 

NV 

0.0109 

0.0118 

0.0061 

Bounds 

0 

0 

0.5 

0 

3 

3 

83 

80 

163 

66 

63 

129 

0 

0 

0.0031 

0 

0.0476 

0.0233 

NV 

NV 

NV 

NV 

NV 

NV 

REF 

FA  z 

-2.74 

-1.991 

var 

0.2153 

0.0582 

MODS 

H z 

-1.906 

-1.788 

-1.657 

-2.176 

-1.964 

-1.892 

var 

0.0265 

0.0224 

0.0184 

0.0513 

0.0356 

0.0312 

d’ 

0.8347 

0.9525 

1.0837 

-0.185 

0.0265 

0.0991 

var 

0.2418 

0.2377 

0.2338 

0.1095 

0.0938 

0.0894 

95%L 

-0.129 

-0.003 

0.136 

-0.834 

-0.574 

-0.487 

95%R 

1.7985 

1.9081 

2.0314 

0.4634 

0.6267 

0.6851 

ORIGIN 

H z 

-1.575 

-1.969 

-1.85 

-2.176 

-2.06 

-1.821 

var 

0.0168 

0.0296 

0.0241 

0.0513 

0.0416 

0.0281 

d' 

1.1654 

0.7717 

0.8901 

-0.185 

-0.069 

0.1699 

var 

0.2321 

0.2449 

0.2394 

0.1095 

0.0998 

0.0863 

95%L 

0.221  1 

-0.198 

-0.069 

-0.834 

-0.688 

-0.406 

95%R 

2.1097 

1 .7416 

1 .8491 

0.4634 

0.5501 

0.7457 

183 


by  GROUP 


T-Amp 

NN 

LN 

2 

4 

6 

Pool 

2 

4 

6 

Pool 

High 

4 

5 

7 

16 

0 

0 

2 

2 

155 

153 

151 

459 

73 

77 

75 

225 

0.0258 

0.0327 

0.0464 

0.0349 

0 

0 

0.0267 

0.0089 

0.0127 

0.0144 

0.0171 

0.0086 

NV 

NV 

NV 

NV 

Med 

0 

4 

3 

7 

0 

1 

1 

2 

152 

153 

151 

456 

75 

77 

73 

225 

0 

0.0261 

0.0199 

0.0154 

0 

0.013 

0.0137 

0.0089 

NV 

0.0129 

NV 

0.0058 

NV 

NV 

NV 

NV 

Low 

1 

4 

4 

9 

0 

3 

1 

4 

153 

152 

151 

456 

79 

75 

75 

229 

0.0065 

0.0263 

0.0265 

0.0197 

0 

0.04 

0.0133 

0.0175 

NV 

0.013 

0.0131 

0.0065 

NV 

NV 

NV 

0.0087 

Pool 

5 

13 

14 

32 

0.5 

4 

4 

8 

460 

458 

453 

1371 

227 

229 

223 

679 

0.0109 

0.0284 

0.0309 

0.0233 

0.0022 

0.0175 

0.0179 

0.0118 

0.0048 

0.0078 

0.0081 

0.0041 

NV 

0.0087 

0.0089 

0.0041 

Bounds 

4 

2 

6 

1 

0 

1 

153 

153 

306 

80 

76 

156 

0.0261 

0.0131 

0.0196 

0.0125 

0 

0.0064 

0.0129 

NV 

0.0079 

NV 

NV 

NV 

REF 

FA  z 

-2.062 

-2.489 

var 

0.0277 

0.1256 

MODS 

H z 

-2.295 

-1.905 

-1.868 

-2.848 

-2.109 

-2.098 

var 

0.0285 

0.0143 

0.0136 

0.2022 

0.0403 

0.0406 

d' 

-0.233 

0.1568 

0.1943 

-0.359 

0.3796 

0.3904 

var 

0.0562 

0.042 

0.0413 

0.3278 

0.1659 

0.1662 

95%L 

-0.697 

-0.245 

-0.204 

-1.481 

-0.419 

-0.409 

95%R 

0.2315 

0.5584 

0.5926 

0.7633 

1.1778 

1.1893 

ORIGIN 

H z 

-1.814 

-2.161 

-2.059 

-2.37 

-2.37 

-2.109 

var 

0.0124 

0.0222 

0.0185 

0.0677 

0.0677 

0.0403 

d’ 

1.1643 

0.7707 

0.8891 

-0.185 

-0.069 

0.1699 

var 

0.2322 

0.245 

0.2395 

0.1095 

0.0998 

0.0863 

95%L 

0.2198 

-0.199 

-0.07 

-0.834 

-0.688 

-0.406 

95%R 

2.1089 

1.7409 

1.8483 

0.4634 

0.5501 

0.7457 

184 


by 

GROUP 

T-Amp 

NF 

LF 

2 

4 

6 

Pool 

2 

4 

6 

Pool 

High 

0 

2 

3 

5 

0 

1 

2 

3 

81 

83 

82 

246 

70 

69 

69 

208 

0 

0.0241 

0.0366 

0.0203 

0 

0.0145 

0.029 

0.0144 

NV 

NV 

NV 

0.009 

NV 

NV 

NV 

NV 

Med 

0 

0 

2 

2 

0 

0 

0 

0.5 

84 

83 

82 

249 

69 

68 

69 

206 

0 

0 

0.0244 

0.008 

0 

0 

0 

0.0024 

NV 

NV 

NV 

NV 

NV 

NV 

NV 

NV 

Low 

1 

3 

2 

6 

0 

0 

2 

2 

84 

83 

84 

251 

67 

70 

69 

206 

0.0119 

0.0361 

0.0238 

0.0239 

0 

0 

0.029 

0.0097 

NV 

NV 

NV 

0.0096 

NV 

NV 

NV 

NV 

Pool 

1 

5 

7 

13 

0.5 

1 

4 

5.5 

249 

249 

248 

746 

206 

207 

207 

620 

0.004 

0.0201 

0.0282 

0.0174 

0.0024 

0.0048 

0.0193 

0.0089 

NV 

0.0089 

0.0105 

0.0048 

NV 

NV 

0.0096 

0.0038 

Bounds 

0 

0 

0.5 

0 

0 

0.5 

87 

80 

167 

66 

66 

132 

0 

0 

0.003 

0 

0 

0.0038 

NV 

NV 

NV 

NV 

NV 

NV 

REF 

FA  z 

-2.748 

-2.67 

var 

0.2143 

0.2246 

MODS 

H z 

-2.651 

-2.052 

-1.908 

-2.817 

-2.588 

-2.068 

var 

0.1136 

0.0335 

0.0264 

0.2058 

0.1181 

0.0414 

d' 

0.0977 

0.6963 

0.8409 

-0.146 

0.0827 

0.6025 

var 

0.3279 

0.2478 

0.2407 

0.4304 

0.3427 

0.266 

95%L 

-1.025 

-0.279 

-0.121 

-1.432 

-1.065 

-0.408 

95%R 

1.2201 

1.6719 

1.8025 

1.1398 

1.2301 

1.6133 

ORIGIN 

H z 

-2.047 

-2.407 

-1.979 

-2.186 

-2.817 

-2.337 

var 

0.0336 

0.0661 

0.0293 

0.051 

0.2058 

0.0692 

d' 

0.7013 

0.341 

0.7694 

0.4848 

-0.146 

0.333 

var 

0.2479 

0.2804 

0.2436 

0.2756 

0.4304 

0.2938 

95%L 

-0.274 

-0.697 

-0.198 

-0.544 

-1.432 

-0.729 

95%R 

1.6772 

1.3789 

1.7368 

1.5137 

1.1398 

1.3953 

185 


by 

GROUP 

T-Dur 

NN 

LN 

2 

4 

6 

Pool 

2 

4 

6 

Pool 

High 

4 

2 

4 

10 

1 

0 

1 

2 

157 

152 

154 

463 

75 

73 

76 

224 

0.0255 

0.0132 

0.026 

0.0216 

0.0133 

0 

0.0132 

0.0089 

0.0126 

NV 

0.0128 

0.0068 

NV 

NV 

NV 

NV 

Med 

5 

2 

12 

19 

0 

4 

2 

6 

152 

149 

153 

454 

71 

77 

78 

226 

0.0329 

0.0134 

0.0784 

0.0419 

0 

0.0519 

0.0256 

0.0265 

0.0145 

NV 

0.0217 

0.0094 

NV 

0.0253 

NV 

0.0107 

Low 

0 

6 

24 

30 

1 

1 

9 

11 

152 

153 

153 

458 

71 

74 

76 

221 

0 

0.0392 

0.1569 

0.0655 

0.0141 

0.0135 

0.1184 

0.0498 

NV 

0.0157 

0.0294 

0.0116 

NV 

NV 

0.0371 

0.0146 

Pool 

9 

10 

40 

59 

2 

5 

12 

19 

461 

454 

460 

1375 

217 

224 

230 

671 

0.0195 

0.022 

0.087 

0.0429 

0.0092 

0.0223 

0.0522 

0.0283 

0.0064 

0.0069 

0.0131 

0.0055 

NV 

0.0099 

0.0147 

0.0064 

Bounds 

1 

2 

3 

0 

1 

1 

157 

156 

313 

78 

79 

157 

0.0064 

0.0128 

0.0096 

0 

0.0127 

0.0064 

NV 

NV 

NV 

NV 

NV 

NV 

REF 

FA  z 

-2.342 

-2.491 

var 

0.046 

0.1254 

MODS 

H z 

-2.064 

-2.014 

-1.36 

-2.357 

-2.008 

-1.624 

var 

0.0185 

0.0172 

0.0069 

0.0683 

0.0345 

0.0189 

d' 

0.2785 

0.3286 

0.9825 

0.1342 

0.483 

0.8669 

var 

0.0644 

0.0632 

0.0529 

0.1938 

0.1599 

0.1443 

95%L 

-0.219 

-0.164 

0.5318 

-0.729 

-0.301 

0.1223 

95%R 

0.776 

0.8213 

1.4332 

0.997 

1.2669 

1.6115 

ORIGIN 

H z 

-2.022 

-1.73 

-1.51 

-2.369 

-1.934 

-1.647 

var 

0.0171 

0.0111 

0.0082 

0.0678 

0.0303 

0.0203 

d' 

0.3204 

0.6126 

0.8321 

0.1224 

0.5569 

0.8439 

var 

0.0631 

0.057 

0.0542 

0.1932 

0.1557 

0.1457 

95%L 

-0.172 

0.1445 

0.3758 

-0.739 

-0.217 

0.0958 

95%R 

0.8127 

1.0807 

1.2884 

0.984 

1.3303 

1.5921 

186 


by 

GROUP 

T-Dur 

NF 

LF 

2 

4 

6 

Pool 

2 

4 

6 

Pool 

High 

0 

2 

1 

3 

1 

2 

0 

3 

80 

85 

79 

244 

67 

71 

67 

205 

0 

0.0235 

0.0127 

0.0123 

0.0149 

0.0282 

0 

0.0146 

NV 

NV 

NV 

NV 

NV 

NV 

NV 

NV 

Med 

2 

5 

3 

10 

0 

0 

3 

3 

83 

82 

83 

248 

70 

68 

65 

203 

0.0241 

0.061 

0.0361 

0.0403 

0 

0 

0.0462 

0.0148 

NV 

0.0264 

NV 

0.0125 

NV 

NV 

NV 

NV 

Low 

1 

3 

6 

10 

0 

0 

3 

3 

83 

82 

81 

246 

71 

68 

67 

206 

0.012 

0.0366 

0.0741 

0.0407 

0 

0 

0.0448 

0.0146 

NV 

NV 

0.0291 

0.0126 

NV 

NV 

NV 

NV 

Pool 

3 

10 

10 

23 

1 

2 

6 

9 

246 

249 

243 

738 

208 

207 

199 

614 

0.0122 

0.0402 

0.0412 

0.0312 

0.0048 

0.0097 

0.0302 

0.0147 

NV 

0.0124 

0.0127 

0.0064 

NV 

NV 

0.0121 

0.0049 

Bounds 

1 

1 

2 

1 

1 

2 

82 

83 

165 

66 

65 

131 

0.0122 

0.012 

0.0121 

0.0152 

0.0154 

0.0153 

NV 

NV 

NV 

NV 

NV 

NV 

REF 

FA  z 

-2.253 

-2.163 

var 

0.0731 

0.0776 

MODS 

H z 

-2.251 

-1.749 

-1.737 

-2.589 

-2.339 

-1.879 

var 

0.0488 

0.0207 

0.0209 

0.118 

0.0691 

0.0315 

d' 

0.0023 

0.5044 

0.5158 

-0.426 

-0.176 

0.2845 

var 

0.1219 

0.0938 

0.094 

0.1956 

0.1467 

0.1091 

95%L 

-0.682 

-0.096 

-0.085 

-1.293 

-0.927 

-0.363 

95%R 

0.6867 

1.1048 

1.1167 

0.4406 

0.5747 

0.9319 

ORIGIN 

H z 

-2.248 

-1.747 

-1.743 

-2.18 

-2.176 

-2.182 

var 

0.0489 

0.0207 

0.0208 

0.0512 

0.0513 

0.0511 

d' 

0.0055 

0.5063 

0.5101 

-0.017 

-0.013 

-0.019 

var 

0.122 

0.0938 

0.0939 

0.1288 

0.1289 

0.1287 

95%L 

-0.679 

-0.094 

-0.091 

-0.72 

-0.717 

-0.722 

95%R 

0.6901 

1.1067 

1.1107 

0.6867 

0.6909 

0.6846 

187 


by 

GROUP 

POOLS 

D-Amp 

All  Natives 

All 

Foreign 

2 

4 

6 

Pool 

2 

4 

6 

Pool 

High 

2 

3 

11 

16 

1 

3 

12 

16 

228 

233 

224 

685 

149 

147 

152 

448 

0.0088 

0.0129 

0.0491 

0.0234 

0.0067 

0.0204 

0.0789 

0.0357 

NV 

NV 

0.0144 

0.0058 

NV 

NV 

0.0219 

0.0088 

Med 

2 

3 

35 

40 

5 

2 

29 

36 

233 

234 

233 

700 

146 

150 

147 

443 

0.0086 

0.0128 

0.1502 

0.0571 

0.0342 

0.0133 

0.1973 

0.0813 

NV 

NV 

0.0234 

0.0088 

0.0151 

NV 

0.0328 

0.013 

Low 

6 

24 

5 

35 

0 

15 

5 

20 

228 

228 

229 

685 

153 

149 

154 

456 

0.0263 

0.1053 

0.0218 

0.0511 

0 

0.1007 

0.0325 

0.0439 

0.0106 

0.0203 

0.0097 

0.0084 

NV 

0.0247 

0.0143 

0.0096 

Pool 

10 

30 

51 

91 

6 

20 

46 

72 

689 

695 

686 

2070 

448 

446 

453 

1347 

0.0145 

0.0432 

0.0743 

0.044 

0.0134 

0.0448 

0.1015 

0.0535 

0.0046 

0.0077 

0.01 

0.0045 

0.0054 

0.0098 

0.0142 

0.0061 

Bounds 

5 

3 

8 

4 

5 

9 

229 

228 

457 

145 

147 

292 

0.0218 

0.0132 

0.0175 

0.0276 

0.034 

0.0308 

0.0097 

NV 

0.0061 

0.0136 

0.015 

0.0101 

REF 

FA  z 

-2.108 

-1.869 

var 

0.0201 

0.0211 

MODS 

H z 

-2.183 

-1.715 

-1.444 

-2.215 

-1.697 

-1.273 

var 

0.0153 

0.0071 

0.0051 

0.025 

0.0107 

0.0064 

d' 

-0.075 

0.3931 

0.664 

-0.346 

0.1718 

0.5961 

var 

0.0355 

0.0272 

0.0252 

0.0461 

0.0319 

0.0275 

95%L 

-0.444 

0.0698 

0.3528 

-0.767 

-0.178 

0.2709 

95%R 

0.2942 

0.7165 

0.9753 

0.0752 

0.5217 

0.9212 

ORIGIN 

H z 

-1.989 

-1.579 

-1.634 

-1.803 

-1.397 

-1.708 

var 

0.0109 

0.0059 

0.0064 

0.0125 

0.0074 

0.0107 

d' 

0.1194 

0.529 

0.4739 

0.0661 

0.4722 

0.1613 

var 

0.0311 

0.026 

0.0266 

0.0336 

0.0286 

0.0318 

95%L 

-0.226 

0.213 

0.1544 

-0.293 

0.1409 

-0.188 

95%R 

0.4648 

0.845 

0.7934 

0.4253 

0.8036 

0.5108 

188 


by 

GROUP 

POOLS 

D-Dur 

All 

Natives 

2 

4 

6 

Pool 

High 

3 

4 

4 

11 

232 

230 

227 

689 

0.0129 

0.0174 

0.0176 

0.016 

NV 

0.0086 

0.0087 

0.0048 

Med 

3 

7 

3 

13 

229 

233 

225 

687 

0.0131 

0.03 

0.0133 

0.0189 

NV 

0.0112 

NV 

0.0052 

Low 

4 

2 

3 

9 

230 

228 

232 

690 

0.0174 

0.0088 

0.0129 

0.013 

0.0086 

NV 

NV 

0.0043 

Pool 

10 

13 

10 

33 

691 

691 

684 

2066 

0.0145 

0.0188 

0.0146 

0.016 

0.0045 

0.0052 

0.0046 

0.0028 

Bounds 

3 

10 

13 

232 

237 

469 

0.0129 

0.0422 

0.0277 

NV 

0.0131 

0.0076 

REF 

FA  z 

-1.915 

var 

0.0142 

MODS 

H z 

-2.184 

-2.079 

-2.18 

var 

0.0153 

0.0126 

0.0153 

d' 

-0.269 

-0.163 

-0.265 

var 

0.0295 

0.0268 

0.0295 

95%L 

-0.605 

-0.484 

-0.601 

95%R 

0.0676 

0.1574 

0.0719 

ORIGIN 

H z 

-2.145 

-2.077 

-2.225 

var 

0.0143 

0.0127 

0.0166 

d' 

-0.23 

-0.161 

-0.309 

var 

0.0284 

0.0268 

0.0307 

95%L 

-0.56 

-0.482 

-0.653 

95%R 

0.1007 

0.1599 

0.034 

AH 

Foreign 

2 

4 

6 

Pool 

4 

5 

8 

17 

148 

148 

150 

446 

0.027 

0.0338 

0.0533 

0.0381 

0.0133 

0.0149 

0.0183 

0.0091 

3 

5 

2 

10 

153 

146 

149 

448 

0.0196 

0.0342 

0.0134 

0.0223 

NV 

0.0151 

NV 

0.007 

3 

4 

8 

15 

149 

152 

152 

453 

0.0201 

0.0263 

0.0526 

0.0331 

NV 

0.013 

0.0181 

0.0084 

10 

14 

18 

42 

450 

446 

451 

1347 

0.0222 

0.0314 

0.0399 

0.0312 

0.0069 

0.0083 

0.0092 

0.0047 

0 

3 

3 

149 

143 

292 

0 

0.021 

0.0103 

NV 

NV 

NV 

-2.316 

0.0468 

-2.01 

-1.861 

-1.752 

0.0172 

0.0137 

0.0115 

0.3063 

0.4554 

0.5645 

0.064 

0.0604 

0.0582 

-0.19 

-0.026 

0.0914 

0.8022 

0.9373 

1.0375 

-1.773 

-2.008 

-1.837 

0.012 

0.0173 

0.013 

0.5432 

0.3082 

0.4793 

0.0587 

0.064 

0.0597 

0.0682 

-0.188 

0.0003 

1.0183 

0.8041 

0.9583 

189 


by 

GROUP 

POOLS 

T-Amp 

All 

Natives 

2 

4 

6 

Pool 

High 

4 

5 

9 

18 

228 

230 

226 

684 

0.0175 

0.0217 

0.0398 

0.0263 

0.0087 

0.0096 

0.013 

0.0061 

Med 

0 

5 

4 

9 

227 

230 

224 

681 

0 

0.0217 

0.0179 

0.0132 

NV 

0.0096 

0.0088 

0.0044 

Low 

1 

7 

5 

13 

232 

227 

226 

685 

0.0043 

0.0308 

0.0221 

0.019 

NV 

0.0115 

0.0098 

0.0052 

Pool 

5 

17 

18 

40 

687 

687 

676 

2050 

0.0073 

0.0247 

0.0266 

0.0195 

0.0032 

0.0059 

0.0062 

0.0031 

Bounds 

5 

2 

7 

233 

229 

462 

0.0215 

0.0087 

0.0152 

0.0095 

NV 

0.0057 

REF 

FA  z 

-2.166 

var 

0.0221 

MODS 

H z 

-2.443 

-1.964 

-1.933 

var 

0.0259 

0.0105 

0.0101 

d' 

-0.277 

0.2018 

0.2333 

var 

0.048 

0.0326 

0.0322 

95%L 

-0.706 

-0.152 

-0.119 

95%R 

0.1522 

0.5556 

0.5852 

ORIGIN 

H z 

-1.938 

-2.22 

-2.075 

var 

0.0101 

0.0166 

0.0127 

d' 

0.2282 

-0.054 

0.0908 

var 

0.0322 

0.0387 

0.0348 

95%L 

-0.124 

-0.439 

-0.275 

95%R 

0.5799 

0.3321 

0.4565 

All 

Foreign 

2 

4 

6 

Pool 

0 

3 

5 

8 

151 

152 

151 

454 

0 

0.0197 

0.0331 

0.0176 

NV 

NV 

0.0146 

0.0062 

0 

0 

2 

2 

153 

151 

151 

455 

0 

0 

0.0132 

0.0044 

NV 

NV 

NV 

NV 

1 

3 

4 

8 

151 

153 

153 

457 

0.0066 

0.0196 

0.0261 

0.0175 

NV 

NV 

0.0129 

0.0061 

1 

6 

11 

18 

455 

456 

455 

1366 

0.0022 

0.0132 

0.0242 

0.0132 

NV 

0.0053 

0.0072 

0.0031 

0 

0 

0.5 

153 

146 

299 

0 

0 

0.0017 

NV 

NV 

NV 

-2.934 

0.1923 

-2.848 

-2.222 

-1.974 

0.1011 

0.0249 

0.0161 

0.0859 

0.7127 

0.9599 

0.2934 

0.2172 

0.2084 

-0.976 

-0.201 

0.0651 

1.1475 

1.6262 

1.8547 

-2.106 

-2.62 

-2.108 

0.0202 

0.0579 

0.0201 

0.8286 

0.3141 

0.8259 

0.2125 

0.2502 

0.2125 

-0.075 

-0.666 

-0.078 

1.7322 

1.2946 

1.7294 

190 


by 

GROUP 

POOLS 

T-Dur 

All 

Natives 

All  Foreign 

2 

4 

6 

Pool 

2 

4 

6 

Pool 

High 

5 

2 

5 

12 

1 

4 

1 

6 

232 

225 

230 

687 

147 

156 

146 

449 

0.0216 

0.0089 

0.0217 

0.0175 

0.0068 

0.0256 

0.0068 

0.0134 

0.0095 

NV 

0.0096 

0.005 

NV 

0.0127 

NV 

0.0054 

Med 

5 

6 

14 

25 

2 

5 

6 

13 

223 

226 

231 

680 

153 

150 

148 

451 

0.0224 

0.0265 

0.0606 

0.0368 

0.0131 

0.0333 

0.0405 

0.0288 

0.0099 

0.0107 

0.0157 

0.0072 

NV 

0.0147 

0.0162 

0.0079 

Low 

1 

7 

33 

41 

1 

3 

9 

13 

223 

227 

229 

679 

154 

150 

148 

452 

0.0045 

0.0308 

0.1441 

0.0604 

0.0065 

0.02 

0.0608 

0.0288 

NV 

0.0115 

0.0232 

0.0091 

NV 

NV 

0.0196 

0.0079 

Pool 

11 

15 

52 

78 

4 

12 

16 

32 

678 

678 

690 

2046 

454 

456 

442 

1352 

0.0162 

0.0221 

0.0754 

0.0381 

0.0088 

0.0263 

0.0362 

0.0237 

0.0049 

0.0056 

0.01 

0.0042 

0.0044 

0.0075 

0.0089 

0.0041 

Bounds 

1 

3 

4 

2 

2 

4 

235 

235 

470 

148 

148 

296 

0.0043 

0.0128 

0.0085 

0.0135 

0.0135 

0.0135 

NV 

NV 

0.0042 

NV 

NV 

0.0067 

REF 

FA  z 

-2.386 

-2.211 

var 

0.0335 

0.0376 

MODS 

H z 

-2.139 

-2.012 

-1.437 

-2.373 

-1.938 

-1.797 

var 

0.0143 

0.0115 

0.005 

0.0338 

0.0151 

0.0125 

d' 

0.2474 

0.3745 

0.9493 

-0.162 

0.2732 

0.4145 

var 

0.0479 

0.045 

0.0385 

0.0714 

0.0527 

0.0501 

95%L 

-0.181 

-0.041 

0.5646 

-0.686 

-0.177 

-0.024 

95%R 

0.6762 

0.7903 

1.334 

0.3613 

0.7231 

0.8532 

ORIGIN 

H z 

-2.109 

-1.79 

-1.552 

-2.215 

-1.898 

-1.899 

var 

0.0134 

0.008 

0.0058 

0.025 

0.0143 

0.0143 

d’ 

0.2771 

0.5967 

0.8347 

-0.004 

0.3128 

0.3118 

var 

0.0469 

0.0416 

0.0394 

0.0626 

0.0519 

0.0519 

95%L 

-0.147 

0.1971 

0.4459 

-0.495 

-0.134 

-0.135 

95%R 

0.7018 

0.9963 

1.2235 

0.4859 

0.7594 

0.7583 

191 


by  VOWEL 


D-Amp 

i 

a 

2 

4 

6 

Pool 

2 

4 

6 

Pool 

High 

1 

4 

20 

25 

1 

0 

2 

3 

126 

126 

125 

377 

127 

127 

125 

379 

0.0079 

0.0317 

0.16 

0.0663 

0.0079 

0 

0.016 

0.0079 

NV 

0.0156 

0.0328 

0.0128 

NV 

NV 

NV 

NV 

Med 

4 

0 

58 

62 

2 

1 

2 

5 

125 

129 

127 

381 

127 

127 

127 

381 

0.032 

0 

0.4567 

0.1627 

0.0157 

0.0079 

0.0157 

0.0131 

0.0157 

NV 

0.0442 

0.0189 

NV 

NV 

NV 

0.0058 

Low 

1 

35 

7 

43 

3 

1 

3 

7 

128 

126 

129 

383 

127 

125 

126 

378 

0.0078 

0.2778 

0.0543 

0.1123 

0.0236 

0.008 

0.0238 

0.0185 

NV 

0.0399 

0.0199 

0.0161 

NV 

NV 

NV 

0.0069 

Pool 

6 

39 

85 

130 

6 

2 

7 

15 

379 

381 

381 

1141 

381 

379 

378 

1138 

0.0158 

0.1024 

0.2231 

0.1139 

0.0157 

0.0053 

0.0185 

0.0132 

0.0064 

0.0155 

0.0213 

0.0094 

0.0064 

NV 

0.0069 

0.0034 

Bounds 

6 

4 

10 

2 

1 

3 

123 

126 

249 

126 

125 

251 

0.0488 

0.0317 

0.0402 

0.0159 

0.008 

0.012 

0.0194 

0.0156 

0.0124 

NV 

NV 

NV 

REF 

FA  z 

-1.749 

-2.259 

var 

0.0207 

0.0486 

MODS 

H z 

-2.149 

-1.268 

-0.762 

-2.151 

-2.557 

-2.085 

var 

0.0261 

0.0076 

0.0051 

0.0261 

0.0602 

0.0234 

d' 

-0.4 

0.4806 

0.987 

0.1079 

-0.298 

0.1733 

var 

0.0468 

0.0283 

0.0258 

0.0747 

0.1087 

0.0719 

95%L 

-0.824 

0.151 

0.6721 

-0.428 

-0.945 

-0.352 

95%R 

0.0244 

0.8102 

1.302 

0.6434 

0.3479 

0.699 

ORIGIN 

H z 

-1.504 

-0.983 

-1.215 

-2.413 

-2.223 

-2.085 

var 

0.0099 

0.0059 

0.0071 

0.0439 

0.0298 

0.0234 

d’ 

0.245 

0.7655 

0.5343 

-0.154 

0.0361 

0.1733 

var 

0.0306 

0.0266 

0.0279 

0.0925 

0.0784 

0.0719 

95%L 

-0.098 

0.4457 

0.2071 

-0.75 

-0.513 

-0.352 

95%R 

0.5879 

1.0853 

0.8614 

0.442 

0.5849 

0.699 

192 


by  VOWEL 


D-Amp 

u 

All  Vs 

2 

4 

6 

Pool 

2 

4 

6 

Pool 

High 

1 

2 

1 

4 

3 

6 

23 

32 

124 

127 

126 

377 

377 

380 

376 

1133 

0.0081 

0.0157 

0.0079 

0.0106 

0.008 

0.0158 

0.0612 

0.0282 

NV 

NV 

NV 

0.0053 

NV 

0.0064 

0.0124 

0.0049 

Med 

1 

4 

4 

9 

7 

5 

64 

76 

127 

128 

126 

381 

379 

384 

380 

1143 

0.0079 

0.0313 

0.0317 

0.0236 

0.0185 

0.013 

0.1684 

0.0665 

NV 

0.0154 

0.0156 

0.0078 

0.0069 

0.0058 

0.0192 

0.0074 

Low 

2 

3 

0 

5 

6 

39 

10 

55 

126 

126 

128 

380 

381 

377 

383 

1141 

0.0159 

0.0238 

0 

0.0132 

0.0157 

0.1034 

0.0261 

0.0482 

NV 

0.0136 

0 

0.0058 

0.0064 

0.0157 

0.0081 

0.0063 

Pool 

4 

9 

5 

18 

16 

50 

97 

163 

377 

381 

380 

1138 

1137 

1141 

1139 

3417 

0.0106 

0.0236 

0.0132 

0.0158 

0.0141 

0.0438 

0.0852 

0.0477 

0.0053 

0.0078 

0.0058 

0.0037 

0.0035 

0.0061 

0.0083 

0.0036 

Bounds 

1 

3 

4 

9 

8 

17 

126 

124 

250 

375 

375 

750 

0.0079 

0.0242 

0.016 

0.024 

0.0213 

0.0227 

NV 

NV 

0.0079 

0.0079 

0.0075 

0.0054 

REF 

FA  z 

-2.144 

-2.002 

var 

0.0393 

0.0102 

MODS 

H z 

-2.304 

-1.984 

-2.222 

-2.195 

-1.708 

-1.371 

var 

0.0354 

0.0195 

0.0299 

0.0095 

0.0043 

0.0028 

d' 

-0.16 

0.1603 

-0.077 

-0.194 

0.2936 

0.6304 

var 

0.0747 

0.0588 

0.0692 

0.0197 

0.0145 

0.013 

95%L 

-0.695 

-0.315 

-0.593 

-0.469 

0.0579 

0.4068 

95%R 

0.3759 

0.6356 

0.4384 

0.0813 

0.5293 

0.854 

ORIGIN 

H z 

-2.304 

-1.984 

-2.222 

-1.907 

-1.502 

-1.663 

var 

0.0354 

0.0195 

0.0299 

0.0058 

0.0033 

0.004 

d' 

-0.16 

0.1603 

-0.077 

0.0943 

0.4991 

0.339 

var 

0.0747 

0.0588 

0.0692 

0.016 

0.0135 

0.0142 

95%L 

-0.695 

-0.315 

-0.593 

-0.153 

0.2717 

0.1054 

95%R 

0.3759 

0.6356 

0.4384 

0.3421 

0.7265 

0.5726 

193 


by  VOWEL 


D-Dur 

i 

a 

2 

4 

6 

Pool 

2 

4 

6 

Pool 

High 

4 

1 

5 

10 

1 

5 

5 

11 

127 

126 

125 

378 

127 

12  7 

125 

379 

0.0315 

0.0079 

0.04 

0.0265 

0.0079 

0.0394 

0.04 

0.029 

0.0155 

NV 

0.0175 

0.0083 

NV 

0.0173 

0.0175 

0.0086 

Med 

3 

11 

3 

17 

0 

0 

1 

1 

129 

127 

125 

381 

127 

127 

126 

380 

0.0233 

0.0866 

0.024 

0.0446 

0 

0 

0.0079 

0.0026 

NV 

0.025 

NV 

0.0106 

NV 

NV 

NV 

NV 

Low 

7 

2 

4 

13 

0 

4 

2 

6 

126 

128 

127 

381 

127 

126 

129 

382 

0.0556 

0.0156 

0.0315 

0.0341 

0 

0.0317 

0.0155 

0.0157 

0.0204 

NV 

0.0155 

0.0093 

0 

0.0156 

NV 

0.0064 

Pool 

14 

14 

12 

40 

1 

9 

8 

18 

382 

381 

377 

1140 

381 

380 

380 

1141 

0.0366 

0.0367 

0.0318 

0.0351 

0.0026 

0.0237 

0.0211 

0.0158 

0.0096 

0.0096 

0.009 

0.0054 

NV 

0.0078 

0.0074 

0.0037 

Bounds 

0 

8 

8 

3 

2 

5 

129 

127 

256 

127 

126 

253 

0 

0.063 

0.0313 

0.0236 

0.0159 

0.0198 

NV 

0.0216 

0.0109 

NV 

NV 

0.0088 

REF 

FA  z 

-1.863 

-2.059 

var 

0.0239 

0.0333 

MODS 

H z 

-1.791 

-1.79 

-1.855 

-2.791 

-1.983 

-2.032 

var 

0.0144 

0.0144 

0.016 

0.1045 

0.0195 

0.0212 

d' 

0.0718 

0.073 

0.0082 

-0.733 

0.0757 

0.0262 

var 

0.0382 

0.0382 

0.0399 

0.1378 

0.0528 

0.0545 

95%L 

-0.311 

-0.31 

-0.383 

-1.46 

-0.375 

-0.432 

95%R 

0.455 

0.4562 

0.3996 

-0.005 

0.5262 

0.4839 

ORIGIN 

H z 

-1.936 

-1.699 

-1.823 

-1.895 

-2.79 

-2.152 

var 

0.0181 

0.0126 

0.0151 

0.017 

0.1045 

0.0261 

d’ 

-0.073 

0.1633 

0.0393 

0.1633 

-0.732 

-0.093 

var 

0.042 

0.0365 

0.039 

0.0503 

0.1379 

0.0594 

95%L 

-0.475 

-0.211 

-0.348 

-0.276 

-1.46 

-0.571 

95%R 

0.3288 

0.5377 

0.4263 

0.6029 

-0.004 

0.3846 

194 


by  VOWEL 


D-Dur 

u 

All  Vs 

2 

4 

6 

Pool 

2 

4 

6 

Pool 

High 

2 

3 

2 

7 

7 

9 

12 

28 

126 

125 

127 

378 

380 

378 

377 

1135 

0.0159 

0.024 

0.0157 

0.0185 

0.0184 

0.0238 

0.0318 

0.0247 

NV 

NV 

NV 

0.0069 

0.0069 

0.0078 

0.009 

0.0046 

Med 

3 

1 

1 

5 

6 

12 

5 

23 

126 

125 

123 

374 

382 

379 

374 

1135 

0.0238 

0.008 

0.0081 

0.0134 

0.0157 

0.0317 

0.0134 

0.0203 

NV 

NV 

NV 

0.0059 

0.0064 

0.009 

0.0059 

0.0042 

Low 

0 

0 

5 

5 

7 

6 

11 

24 

126 

126 

128 

380 

379 

380 

384 

1143 

0 

0 

0.0391 

0.0132 

0.0185 

0.0158 

0.0286 

0.021 

NV 

NV 

0.0171 

0.0058 

0.0069 

0.0064 

0.0085 

0.0042 

Pool 

5 

4 

8 

17 

20 

27 

28 

75 

378 

376 

378 

1132 

1141 

1137 

1135 

3413 

0.0132 

0.0106 

0.0212 

0.015 

0.0175 

0.0237 

0.0247 

0.022 

0.0059 

0.0053 

0.0074 

0.0036 

0.0039 

0.0045 

0.0046 

0.0025 

Bounds 

0 

3 

3 

3 

13 

16 

125 

127 

252 

381 

380 

761 

0 

0.0236 

0.0119 

0.0079 

0.0342 

0.021 

NV 

NV 

NV 

NV 

0.0093 

0.0052 

REF 

FA  z 

-2.26 

-2.033 

var 

0.0485 

0.0106 

MODS 
H z 

-2.219 

-2.303 

-2.03 

-2.108 

-1.982 

-1.966 

var 

0.0299 

0.0354 

0.0212 

0.0081 

0.0065 

0.0063 

d' 

0.0407 

-0.043 

0.2299 

-0.075 

0.0512 

0.0674 

var 

0.0784 

0.0839 

0.0698 

0.0187 

0.0171 

0.0169 

95%L 

-0.508 

-0.611 

-0.288 

-0.342 

-0.205 

-0.188 

95%R 

0.5896 

0.5248 

0.7476 

0.1931 

0.3075 

0.3225 

ORIGIN 
H z 

-2.085 

-2.215 

-2.222 

-1.966 

-2.048 

-2.034 

var 

0.0234 

0.03 

0.0299 

0.0063 

0.0073 

0.0071 

d' 

0.1748 

0.0449 

0.0387 

0.0674 

-0.015 

-0.0005 

var 

0.0719 

0.0785 

0.0784 

0.0169 

0.0179 

0.0177 

95%L 

-0.351 

-0.504 

-0.51 

-0.188 

-0.278 

-0.261 

95%R 

0.7004 

0.594 

0.5874 

0.3225 

0.2469 

0.26 

195 


by  VOWEL 


T-Amp 

i 

a 

2 

4 

6 

Pool 

2 

4 

6 

Pool 

High 

0 

2 

1 

3 

3 

3 

8 

14 

127 

130 

126 

383 

126 

127 

126 

379 

0 

0.0154 

0.0079 

0.0078 

0.0238 

0.0236 

0.0635 

0.0369 

NV 

NV 

NV 

NV 

NV 

NV 

0.0217 

0.0097 

Med 

0 

0 

1 

1 

0 

2 

2 

4 

129 

125 

125 

379 

126 

127 

125 

378 

0 

0 

0.008 

0.0026 

0 

0.0157 

0.016 

0.0106 

NV 

NV 

NV 

NV 

NV 

NV 

NV 

0.0053 

Low 

1 

4 

2 

7 

0 

4 

6 

10 

128 

127 

129 

384 

126 

129 

125 

380 

0.0078 

0.0315 

0.0155 

0.0182 

0 

0.031 

0.048 

0.0263 

NV 

0.0155 

NV 

0.0068 

NV 

0.0153 

0.0191 

0.0082 

Pool 

1 

6 

4 

11 

3 

9 

16 

28 

384 

382 

380 

1146 

378 

383 

376 

1137 

0.0026 

0.0157 

0.0105 

0.0096 

0.0079 

0.0235 

0.0426 

0.0246 

NV 

0.0064 

0.0052 

0.0029 

NV 

0.0077 

0.0104 

0.0046 

Bounds 

1 

0 

1 

2 

2 

4 

129 

125 

254 

129 

127 

256 

0.0078 

0 

0.0039 

0.0155 

0.0157 

0.0156 

NV 

NV 

NV 

NV 

NV 

0.0078 

REF 

FA  z 

-2.657 

-2.154 

var 

0.1132 

0.0391 

MODS 

H z 

-2.794 

-2.152 

-2.307 

-2.412 

-1.986 

-1.722 

var 

0.1043 

0.0261 

0.0353 

0.044 

0.0195 

0.0132 

d' 

-0.136 

0.5056 

0.3504 

-0.258 

0.1676 

0.4321 

var 

0.2175 

0.1393 

0.1485 

0.083 

0.0585 

0.0523 

95%L 

-1.051 

-0.226 

-0.405 

-0.823 

-0.307 

-0.016 

95%R 

0.7776 

1.237 

1.1056 

0.3068 

0.6417 

0.8801 

ORIGIN 

H z 

-2.417 

-2.79 

-2.092 

-1.787 

-2.305 

-1.938 

var 

0.0438 

0.1046 

0.0233 

0.0144 

0.0353 

0.0181 

d' 

0.2408 

-0.132 

0.5657 

0.3665 

-0.151 

0.2159 

var 

0.157 

0.2177 

0.1365 

0.0534 

0.0744 

0.0572 

95%L 

-0.536 

-1.047 

-0.158 

-0.087 

-0.686 

-0.253 

95%R 

1.0175 

0.7824 

1.2897 

0.8196 

0.3834 

0.6846 

196 


by  VOWEL 


T-Amp 

u 

All  Vs 

2 

4 

6 

Pool 

2 

4 

6 

Pool 

High 

1 

3 

5 

9 

4 

8 

14 

26 

126 

125 

125 

376 

379 

382 

377 

1138 

0.0079 

0.024 

0.04 

0.0239 

0.0106 

0.0209 

0.0371 

0.0228 

NV 

NV 

0.0175 

0.0079 

0.0052 

0.0073 

0.0097 

0.0044 

Med 

0 

3 

3 

6 

0 

5 

6 

11 

125 

129 

125 

379 

380 

381 

375 

1136 

0 

0.0233 

0.024 

0.0158 

0 

0.0131 

0.016 

0.0097 

NV 

NV 

NV 

0.0064 

NV 

0.0058 

0.0065 

0.0029 

Low 

1 

2 

1 

4 

2 

10 

9 

21 

129 

124 

125 

378 

383 

380 

379 

1142 

0.0078 

0.0161 

0.008 

0.0106 

0.0052 

0.0263 

0.0237 

0.0184 

NV 

NV 

NV 

0.0053 

NV 

0.0082 

0.0078 

0.004 

Pool 

2 

8 

9 

19 

6 

23 

29 

58 

380 

378 

375 

1133 

1142 

1143 

1131 

3416 

0.0053 

0.0212 

0.024 

0.0168 

0.0053 

0.0201 

0.0256 

0.017 

NV 

0.0074 

0.0079 

0.0038 

0.0021 

0.0042 

0.0047 

0.0022 

Bounds 

2 

0 

2 

5 

2 

7 

129 

123 

252 

387 

375 

762 

0.0155 

0 

0.0079 

0.0129 

0.0053 

0.0092 

NV 

NV 

NV 

0.0057 

NV 

0.0035 

REF 

FA  z 

-2.412 

-2.358 

var 

0.066 

0.0195 

MODS 

H z 

-2.558 

-2.03 

-1.977 

-2.559 

-2.051 

-1.949 

var 

0.0601 

0.0212 

0.0196 

0.02 

0.0073 

0.0062 

d' 

-0.146 

0.3816 

0.4345 

-0.201 

0.3068 

0.4089 

var 

0.1261 

0.0872 

0.0855 

0.0395 

0.0268 

0.0257 

95%L 

-0.842 

-0.197 

-0.139 

-0.59 

-0.014 

0.0947 

95%R 

0.5498 

0.9603 

1.0077 

0.1891 

0.6276 

0.7232 

ORIGIN 

H z 

-1.978 

-2.149 

-2.305 

-1.998 

-2.338 

-2.088 

var 

0.0196 

0.0261 

0.0353 

0.0067 

0.0126 

0.0078 

d' 

0.4333 

0.2632 

0.1068 

0.3598 

0.0196 

0.2698 

var 

0.0855 

0.0921 

0.1013 

0.0262 

0.0321 

0.0273 

95%L 

-0.14 

-0.332 

-0.517 

0.0426 

-0.331 

-0.054 

95%R 

1.0065 

0.8579 

0.7306 

0.677 

0.3707 

0.5936 

197 


by  VOWEL 


T-Dur 

i 

a 

2 

4 

6 

Pool 

2 

4 

6 

Pool 

High 

1 

0 

0 

1 

3 

3 

1 

7 

127 

129 

126 

382 

126 

126 

127 

379 

0.0079 

0 

0 

0.0026 

0.0238 

0.0238 

0.0079 

0.0185 

NV 

NV 

NV 

NV 

NV 

NV 

NV 

0.0069 

Med 

3 

3 

4 

10 

2 

2 

9 

13 

125 

127 

126 

378 

125 

124 

128 

377 

0.024 

0.0236 

0.0317 

0.0265 

0.016 

0.0161 

0.0703 

0.0345 

NV 

NV 

0.0156 

0.0083 

NV 

NV 

0.0226 

0.0094 

Low 

0 

4 

6 

10 

1 

4 

17 

22 

125 

125 

125 

375 

126 

126 

125 

3 77 

0 

0.032 

0.048 

0.0267 

0.0079 

0.0317 

0.136 

0.0584 

NV 

0.0157 

0.0191 

0.0083 

NV 

0.0156 

0.0307 

0.0121 

Pool 

4 

7 

10 

21 

6 

9 

27 

42 

377 

381 

377 

1135 

377 

376 

380 

1133 

0.0106 

0.0184 

0.0265 

0.0185 

0.0159 

0.0239 

0.0711 

0.0371 

0.0053 

0.0069 

0.0083 

0.004 

0.0064 

0.0079 

0.0132 

0.0056 

Bounds 

1 

3 

4 

2 

0 

2 

127 

129 

256 

126 

127 

253 

0.0079 

0.0233 

0.0156 

0.0159 

0 

0.0079 

NV 

NV 

0.0078 

NV 

NV 

NV 

REF 

FA  z 

-2.154 

-2.413 

var 

0.0391 

0.0659 

MODS 

H z 

-2.304 

-2.089 

-1.935 

-2.147 

-1.978 

-1.468 

var 

0.0354 

0.0233 

0.0182 

0.0262 

0.0196 

0.0094 

d' 

-0.15 

0.0653 

0.2194 

0.2667 

0.4348 

0.9453 

var 

0.0744 

0.0624 

0.0572 

0.092 

0.0854 

0.0753 

95%L 

-0.685 

-0.424 

-0.249 

-0.328 

-0.138 

0.4074 

95%R 

0.3845 

0.5548 

0.6882 

0.8614 

1.0077 

1.4831 

ORIGIN 

H z 

-2.792 

-1.936 

-1.932 

-2.086 

-1.819 

-1.569 

var 

0.1044 

0.0181 

0.0182 

0.0234 

0.0152 

0.0107 

d' 

-0.638 

0.2182 

0.2217 

0.3268 

0.5946 

0.8445 

var 

0.1435 

0.0572 

0.0572 

0.0892 

0.081 

0.0766 

95%L 

-1.381 

-0.251 

-0.247 

-0.259 

0.0367 

0.302 

95%R 

0.1041 

0.687 

0.6906 

0.9123 

1.1526 

1.387 

198 


by 

VOWEL 

T-Dur 

u 

2 

4 

6 

Pool 

High 

2 

3 

5 

10 

126 

126 

123 

375 

0.0159 

0.0238 

0.0407 

0.0267 

NV 

NV 

NV 

0.0083 

Med 

2 

6 

7 

15 

126 

125 

125 

376 

0.0159 

0.048 

0.056 

0.0399 

NV 

0.0191 

0.0206 

0.0101 

Low 

1 

2 

19 

22 

126 

126 

127 

379 

0.0079 

0.0159 

0.1496 

0.058 

NV 

NV 

0.0317 

0.012 

Pool 

5 

11 

31 

47 

378 

3 77 

375 

1130 

0.0132 

0.0292 

0.0827 

0.0416 

0.0059 

0.0087 

0.0142 

0.0059 

Bounds 

0 

2 

2 

127 

127 

254 

0 

0.0157 

0.0079 

NV 

NV 

NV 

REF 

FA  z 

-2.415 

var 

0.0658 

MODS 

H z 

-2.219 

-1.893 

-1.387 

var 

0.0299 

0.017 

0.0087 

d’ 

0.1953 

0.5217 

1.0274 

var 

0.0957 

0.0828 

0.0745 

95%L 

-0.411 

-0.042 

0.4923 

95%R 

0.8017 

1.0858 

1.5625 

ORIGIN 

H z 

-1.932 

-1.752 

-1.571 

var 

0.0182 

0.0138 

0.0107 

d' 

0.4825 

0.6628 

0.8433 

var 

0.084 

0.0796 

0.0765 

95%L 

-0.086 

0.1098 

0.3011 

95%R 

1.0506 

1.2158 

1.3856 

2 

All  Vs 
4 

6 

Pool 

6 

6 

6 

18 

379 

381 

376 

1136 

0.0158 

0.0157 

0.016 

0.0158 

0.0064 

0.0064 

0.0065 

0.0037 

7 

11 

20 

38 

376 

376 

379 

1131 

0.0186 

0.0293 

0.0528 

0.0336 

0.007 

0.0087 

0.0115 

0.0054 

2 

10 

42 

54 

377 

377 

377 

1131 

0.0053 

0.0265 

0.1114 

0.0477 

NV 

0.0083 

0.0162 

0.0063 

15 

27 

68 

110 

1132 

1134 

1132 

3398 

0.0133 

0.0238 

0.0601 

0.0324 

0.0034 

0.0045 

0.0071 

0.003 

3 

5 

8 

380 

383 

763 

0.0079 

0.0131 

0.0105 

NV 

0.0058 

0.0037 

-2.219 

-1.981 

-1.554 

-2.309 

0.0176 

0.01 

0.0065 

0.0035 

0.0897 

0.3278 

0.7543 

0.0276 

0.0241 

0.0211 

-0.236 

0.0233 

0.4694 

0.4153 

0.6323 

1.0393 

-2.148 

-1.83 

-1.667 

0.0087 

0.0051 

0.0041 

0.1602 

0.4782 

0.6414 

0.0263 

0.0228 

0.0217 

-0.158 

0.1824 

0.3527 

0.4783 

0.7739 

0.9301 

APPENDIX  D 

ORIGINAL  CONTRAST  RESULTS 


The  figures  in  this  appendix  present  the  results  concerning  the  original 
contrast  factor.  Their  presentation  is  analogous  to  that  of  the  graphs 
concerning  the  contrast  reduction  factor,  which  are  presented  in  Chap.  7. 
There  four  sets  of  graphs  are  presented:  Group  (Fig.  D.l),  Group  Pool  (Fig. 

D.2),  Cue  (Fig.  D.3),  and  Vowel  (Fig.  D.4).  The  Group  set  comprises  the  4 
graphs  for  the  4 subject  groups  (NN,  NF,  LN,  and  LF)  and  each  graph  plots  the 
group's  use  of  the  4 cues  tested.  The  Group  Pool  set  is  similar  to  the  Group 
set,  but  eliminates  the  distinction  between  naives  and  linguists,  so  it 
comprises  2 graphs,  All  Natives  and  All  Foreign.  The  Cues  set  contains  the 
same  data  as  the  Group  set,  but  it  examines  subject  groups  within  cue  rather 
than  cues  within  subject  group,  so  it  comprises  4 graphs  for  the  4 cues  (DAmp, 
DDur,  TAmp,  and  TDur).  The  Vowel  set  is  included  for  the  sake  of 
thoroughness,  and  is  comprised  of  4 graphs,  /i,  a,  u/  and  All  Vs,  with  each 
graph  plotting  the  4 cues  for  that  vowel  condition.  Note  that  since  the  Vowel 
set  pools  across  all  4 subject  groups,  it  encompasses,  and  presumably 
camouflages,  important  differences  in  listening  strategy  across  subject  groups. 


199 


d'  Values 


200 


Contrast  Reduction 

Figure  D.l.  Original  Contrast  Results  by  Group 


201 


DAmp  DDur  -a-  TAmp  TDur 


nj 

> 


"O 


Contrast  Reduction 

Figure  D.2.  Original  Contrast  Results  by  Group  Pool 


d'  Values 


202 


Contrast  Reduction 

Figure  D.3.  Original  Contrast  Results  by  Cue 


d'  Values 


203 


Contrast  Reduction 

Figure  D.4.  Original  Contrast  Results  by  Vowel 


LIST  OF  REFERENCES 


Abramson,  A.  S.,  & Lisker,  L.  (1970).  Discriminability  along  the  Voicing 
Continuum:  Cross-Language  Tests.  Proceedings  of  the  Sixth  International 
Congress  of  Phonetic  Sciences,  Prague,  1967  (pp.  569-573).  Prague: 
Academia,  Publshing  House  of  the  Czechoslovak  Academy  of  Sciences. 

Abramson,  A.  S.,  & Lisker,  L.  (1985).  Relative  Power  of  Cues:  F0  Shift  Versus 
Voice  Timing.  In  V.  A.  Fromkin  (Ed.),  Phonetic  Linguistics:  Essays  in 
Honor  of  Peter  Ladefoged  (pp.  25-33).  Orlando,  FL:  Academic  Press. 

Abramson,  A.  S.,  Lisker,  L.,  & Koenig,  L.  (1990,  May).  Medial  Voicing 

Distinctions  in  English  Trochees.  Paper  presented  at  the  meeting  of  the 
Acoustical  Society  of  America,  Pennsylvania  State  University,  State 
College,  PA. 


Abry,  C.,  Benoit,  C.,  Boe,  L.  J.,  & Sock,  R.  (1985).  Un  Choix  d'evenements  pour 
l'organisation  temporelle  du  signale  de  parole  [A  Choice  of  Events  for  the 
Temporal  Organization  of  the  Speech  Signal].  In  Groupement  des 
Acousticiens  de  Langue  Frangaise,  Groupe  de  la  Communication  Parlee 
(GALF  GCP)  (Eds.),  JEP  : lie  Journees  d'etudes  sur  la  parole  (pp.  133-137). 
Paris:  Ecole  Nationale  Superieure  des  Telecommunications  (ENST). 


Abry,  C.,  Benoit,  C.,  & Sock,  R.  (1985).  Organisation  segmentale  et  temporelle 
du  signale  de  parole  en  fonction  de  sa  production  [Segmental  and 
Temporal  Organization  of  the  Speech  Signal  as  a Function  of  its 
Production].  Grenoble,  France:  Institut  de  Phonetique  de  Grenoble. 

Andre-Obrecht,  R.  (1988).  A New  Statistical  Approach  for  the  Automatic 
Segmentation  of  Continuous  Speech  Signals.  IEEE  Transactions  on 
Acoustics,  Speech,  and  Signal  Processing,  36,  29-40. 

Bailey,  P.  J.,  & Summerfield,  Q.  (1980).  Information  in  Speech:  Observations 
on  the  Perception  of  [s]-Stop  Clusters.  Journal  of  Experimental  Psychology: 
Human  Perception  and  Performance,  6,  536-563. 

Bailly,  G.,  Benoit,  C.,  & Sawallis,  T.  R.  (Eds.).  (1992).  Talking  Machines: 
Theories,  Models,  Designs.  Amsterdam:  North-Holland. 


204 


205 


Baird,  J.  C.,  & Noma,  E.  (1978).  Fundamentals  of  Scaling  and  Psychophysics. 
New  York:  John  Wiley  & Sons. 

Barth,  J.  (1972).  Chimera.  Greenwich,  CT:  Fawcett. 

Beckman,  M.  E.  (Ed.).  (1993).  Phonetic  Development  [Special  issue].  Journal  of 
Phonetics,  21(1/2). 

Bell-Berti,  F.  (1975).  Control  of  Pharyngeal  Cavity  Size  for  English  Voiced  and 
Voiceless  Stops.  Journal  of  the  Acoustical  Society  of  America,  57,  456-461. 

Best,  C.  T.  (1995).  A Direct  Realist  View  of  Cross-Language  Speech  Perception. 
In  W.  Strange  (Ed.),  Speech  Perception  and  Linguistic  Experience:  Issues  in 
Cross-Langauge  Research  (pp.  171-204).  Timonium,  MD:  York  Press. 

Borden,  G.  J.,  Harris,  K.  S.,  & Raphael,  L.  J.  (1994).  Speech  Science  Primer: 
Physiology,  Acoustics,  and  Perception  of  Speech  (3rd  ed.).  Baltimore,  MD: 
Williams  & Wilkins. 

Browman,  C.  P.,  & Goldstein,  L.  M.  (1986).  Towards  an  Articulatory 
Phonology.  Phonology  Yearbook,  3,  219-252. 

Browman,  C.  P.,  & Goldstein,  L.  (1990).  Tiers  in  Articulatory  Phonology,  with 
some  Implications  for  Casual  Speech.  In  J.  Kingston,  & M.  E.  Beckman 
(Eds.),  Papers  in  Laboratory  Phonology  I:  Between  the  Grammar  and 
Physics  of  Speech  (pp.  341-376).  Cambridge:  Cambridge  University  Press. 

Brunswik,  E.  (1955a).  Representative  Design  and  Probabilistic  Theory  in  a 
Functional  Psychology.  Psychological  Review,  62,  193-217. 

Brunswik,  E.  (1955b).  In  Defense  of  Probabilistic  Functionalism:  A Reply. 
Psychological  Review,  62,  236-242. 

Casagrande,  J.  (1984).  The  Sound  System  of  French.  Washington,  DC: 
Georgetown  University  Press. 

Chen,  M.  (1970).  Vowel  Length  Variation  as  a Function  of  the  Voicing  of  the 
Consonant  Environment.  Phonetica,  22,  129-159. 

Chomsky,  N.,  & Halle,  M.  (1968).  The  Sound  Pattern  of  English.  New  York: 
Harper  and  Row. 


206 


Cooper,  F.  S.,  Delattre,  P.  C,  Liberman,  A.  M.,  Borst,  J.  M.,  & Gerstman,  L.  J. 
(1952).  Some  Experiments  on  the  Perception  of  Synthetic  Speech  Sounds. 
Journal  of  the  Acoustical  Society  of  America,  24,  597-606. 

Crystal,  T.  H.,  & House,  A.  S.  (1982).  Segmental  Duration  in  Connected  Speech 
Signals:  Preliminary  Results.  Journal  of  the  Acoustical  Society  of  America, 
72,  705-716. 

Delattre,  P.  (1951).  Principes  de  phoneticjue  francaise  a I’usage  des  etudiants 
anglo-americains.  Middlebury,  VT:  The  College  Store,  Middlebury  College. 

Delattre,  P.  (1953).  Les  modes  phonetique  du  franqais.  The  French  Review,  27, 
59-63. 


Delattre,  P.  (1958).  Les  Indices  acoustique  de  la  parole:  Premier  rapport  [The 
Acoustic  Cues  of  Speech:  First  Report],  Phonetica,  2,  108-118  & 226-251. 

Delattre,  P.  C.,  Liberman,  A.  M.,  & Cooper,  F.  S.  (1955).  Acoustic  Loci  and 
Transitional  Cues  for  Consonants.  Journal  of  the  Acoustical  Society  of 
America,  27,  769-773. 

Diehl,  R.  L.,  & Kluender,  K.  R.  (1989).  On  the  Objects  of  Speech  Perception. 
Ecological  Psychology,  1,  121-144. 

Edwards,  T.  J.  (1981).  Multiple  Features  Analysis  of  Intervocalic  English 
Plosives.  Journal  of  the  Acoustical  Society  of  America,  69,  535-547. 

Faber,  A.,  & Di  Paolo,  M.  (1995).  The  Discriminability  of  Nearly  Merged 
Sounds.  Language  Variation  and  Change,  7,  35-78. 

Fant,  G.  (1960).  Acoustic  Theory  of  Speech  Production.  The  Hague:  Mouton. 


Feng,  G.,  Achab,  N.,  & Combescure,  P.  (1991).  On-line  Speech  Segmentation 
using  Adaptive  Models:  Application  to  Variable  Rate  Speech  Coding.  In 
Proceedings,  Eurospeech  91  (pp.  705-708).  Genova,  Italy:  European  Speech 
Communication  Association. 

Fienberg,  S.  E.  (1977).  The  Analysis  of  Cross-Classified  Categorical  Data. 
Cambridge,  MA:  MIT  Press. 

Fitch,  H.  L.,  Halwes,  T.,  Erickson,  D.  M.,  & Liberman,  A.  M.  (1980).  Perceptual 
Equivalence  of  Two  Acoustic  Cues  for  Stop-Consonant  manner. 
Perception  & Psychophysics,  27,  343-350. 


207 


Flege,  J.  E.  (1991).  Perception  and  Production:  The  Relevance  of  Phonetic 
Input  to  L2  Phonological  Learning.  In  T.  Huebner,  & C.  A.  Ferguson  (Eds.), 
Crosscurrents  in  Second  Language  Acquisition  and  Linguistic  Theories 
(pp.  249-289).  Amsterdam/Philadelphia:  John  Benjamins. 


Flege,  J.  E.,  & Hillenbrand,  J.  (1987).  A Differential  Effect  of  Release  Bursts  on 
the  Stop  Voicing  Judgments  of  Native  French  and  English  Listeners. 
Journal  of  Phonetics,  15,  203-208. 

Fleiss,  J.  L.  (1981).  Statistical  Methods  for  Rates  and  Proportions  (2nd  ed.)  New 
York,  NY:  John  Wiley  & Sons. 

Fourcin,  A.  (1992).  Assessment  of  Synthetic  Speech.  In  G.  Bailly,  C.  Benoit,  & 
T.  R.  Sawallis  (Eds.),  Talking  Machines:  Theories,  Models,  and  Designs  (pp. 
431-434).  Amsterdam:  North  Holland. 

Fowler,  C.  A.  (1986).  An  Event  Approach  to  the  Study  of  Speech  Perception 
from  a Direct-Realist  Perspective.  Journal  of  Phonetics,  14,  3-28. 

Fromkin,  V.  A.  (Ed.).  (1985).  Phonetic  Linguistics:  Essays  in  Honor  of  Peter 
Ladefoged.  Orlando,  FL:  Academic  Press. 

Green,  D.  M.,  & Swets,  J.  A.  (1966).  Signal  Detection  Theory  and 
Psychophysics.  New  York:  Wiley. 

Grosjean,  F.  (1982).  Life  with  Two  Languages:  An  Introduction  to 
Bilingualism.  Cambridge,  MA:  Harvard  University  Press. 

Harnad,  S.  (Ed.).  (1987).  Categorical  Perception:  The  Groundwork  of 
Cognition.  Cambridge:  Cambridge  University  Press. 

Harris,  C.  M.  (1953).  A Study  of  the  Building  Blocks  in  Speech.  Journal  of  the 
Acoustical  Society  of  America,  25,  962-969. 

Harris,  J.  H.  (1969).  Spanish  Phonology.  Cambridge,  MA:  MIT  Press. 

Harris,  K.  S.  (1958).  Cues  for  the  Discrimination  of  American  English 
Fricatives  in  Spoken  Syllables.  Language  and  Speech,  1,  1-7. 

Harris,  K.  S.,  Hoffman,  H.  S.,  Liberman,  A.  M.,  Delattre,  P.  C.,  & Cooper,  F.  S. 
(1958).  Effects  of  Third-Formant  Transitions  on  the  Perception  of  the 
Voiced  Stop  Consonants.  Journal  of  the  Acoustical  Society  of  America,  30, 
122-126. 


208 


Hoffman,  H.  S.  (1958).  Study  of  Some  Cues  in  the  Perception  of  Voiced  Stop 
Consonants.  Journal  of  the  Acoustical  Society  of  America,  30,  1035-1041. 


International  Phonetic  Association.  (1949).  The  Principles  of  the  International 
Phonetic  Association.  London:  International  Phonetic  Association. 

Jakobson,  R.,  Fant,  C.  G.  M.,  & Halle,  M.  (1969).  Preliminaries  to  Speech 
Analysis:  The  Distinctive  Features  and  their  Correlates.  Cambridge,  MA: 
MIT  Press.  (Originally  published  in  1952  as  MIT  Acoustics  Laboratories 
Techincal  Report  no.  13) 


Keating,  P.  A.  (1985).  Universal  Phonetics  and  the  Organization  of  Grammars. 
In  V.  A.  Fromkin  (Ed.),  Phonetic  Linguistics:  Essays  in  Honor  of  Peter 
Ladefoged  (pp.  115-132).  Orlando,  FL:  Academic  Press. 

Keller,  E.  (1991).  Signalyze™  Version  2.0:  Signal  Analysis  for  Speech  and 
Sound.  User's  Manual.  Lausanne,  Switzerland/Seattle,  WA:  InfoSignal 
Inc. 

Keller,  E.  (1994).  Signalyze ™ Version  3.0:  Signal  Analysis  for  Speech  and 
Sound.  User's  Manual.  Lausanne,  Switzerland:  InfoSignal  Inc. 

Kelso,  J.  A.  S.,  Saltzman,  E.  L.,  & Tuller,  B.  (1986).  The  Dynamical  Perspective 
on  Speech  Production:  Data  and  Theory.  Journal  of  Phonetics,  14,  29-59. 

Klatt,  D.  H.  (1980).  Software  for  a Cascade/Parallel  Formant  Synthesizer. 
Journal  of  the  Acoustical  Society  of  America,  67,  971-995. 

Klatt,  D.  H.  (1975).  Voice  Onset  Time,  Frication,  and  Aspiration  in  Word- 
Initial  Consonant  Clusters.  Journal  of  Speech  and  Hearing  Research,  18, 
686-706. 

Kuhl,  P.  K.  (1992).  Psychoacoustics  and  Speech  Perception:  Internal  Standards, 
Perceptual  Anchors,  and  Prototypes.  In  L.  A.  Werner,  & E.  W.  Rubel  (Eds.), 
Developmental  Psychoacoustics  (pp.  293-332).  Washington,  DC:  American 
Psychological  Association. 

Laeufer,  C.  (1992).  Patterns  of  Voicing-Conditioned  Vowel  Duration  in  French 
and  English.  Journal  of  Phonetics,  20,  411-440. 

Landahl,  K.  L.,  & Ziolkowski,  M.  S.  (1995).  Discovering  Phonetic  Units:  Is  a 
Picture  Worth  a Thousand  Words?  In  A.  Dainora,  R.  Hemphill,  B.  S. 

Luka,  B.  Need,  & S.  Pargman  (Eds.),  CLS  31:  Papers  from  the  31st  Regional 


209 


Meeting  of  the  Chicago  Linguistics  Society:  Vol.  1 The  Main  Session. 
Chicago:  Chicago  Linguistic  Society. 

Lehiste,  I.  (Ed.).  (1967).  Readings  in  Acoustic  Phonetics.  Cambridge,  MA:  MIT 
Press. 

Lehmann,  W.  P.,  & Malkiel,  Y.  (Eds.).  (1968).  Directions  for  Historical 
Linguistics.  Austin,  TX:  University  of  Texas  Press. 

Liberman,  A.  M.  (1957).  Some  Results  of  Research  on  Speech  Perception. 
Journal  of  the  Acoustical  Society  of  America,  29,  117-123. 

Liberman,  A.  M.  (1993).  Some  Assumptions  about  Speech  and  How  They 
Changed.  Haskins  Laboratories  Status  Report  on  Speech  Research,  SR-113, 
1-32. 


Liberman,  A.  M.,  Delattre,  P.,  & Cooper,  F.  S.  (1952).  The  Role  of  Selected 
Stimulus-Variables  in  the  Perception  of  the  Unvoiced  Stop  Consonants. 
American  Journal  of  Psychology,  65,  497-516. 


Liberman,  A.  M.,  Delattre,  P.  C.,  & Cooper,  F.  S.  (1958).  Some  Cues  for  the 
Distinction  between  Voiced  and  Voiceless  Stops  in  Initial  Position. 
Language  and  Speech,  1,  153-167. 


Liberman,  A.  M.,  Delattre,  P.  C.,  Cooper,  F.  S.,  & Gerstman,  L.  J.  (1954).  The 
Role  of  Consonant-Vowel  Transitions  in  the  Perception  of  the  Stop  and 
Nasal  Consonants.  Psychological  Monographs,  68(Whole  No.  379),  1-13. 

Liberman,  A.  M.,  Delattre,  P.  C.,  Gerstman,  L.  J.,  & Cooper,  F.  S.  (1956).  Tempo 
of  Frequency  Change  as  a Cue  for  Distinguishing  Classes  of  Speech 
Sounds.  Journal  of  Experimental  Psychology,  52,  127-137. 

Liberman,  A.  M.,  Ingemann,  F.,  Lisker,  L.,  Delattre,  P.,  & Cooper,  F.  S.  (1959). 
Minimal  Rules  for  Synthesizing  Speech.  Journal  of  the  Acoustical  Society 
of  America,  31,  1490-1499. 

Liberman,  A.  M.,  & Mattingly,  I.  G.  (1985).  The  Motor  Theory  of  Speech 
Perception  Revised.  Cognition,  21,  1-36. 

Lieberman,  P.,  & Blumstein,  S.  E.  (1988).  Speech  Physiology,  Speech 

Perception  and  Acoustic  Phonetics.  Cambridge:  Cambridge  University 
Press. 


210 


Lisker,  L.  (1957a).  Closure  Duration  and  the  Intervocalic  Voiced-Voiceless 
Distinction  in  English.  Language , 33,  42-49. 

Lisker,  L.  (1957b).  Minimal  Cues  for  Separating  /w,  r,  1,  y/  in  Intervocalic 
Position.  Word,  13,  256-267. 


Lisker,  L.  (1977).  Factors  in  the  Maintenance  and  Cessation  of  Voicing. 
Phonetica,  34,  304-306. 


Lisker,  L.  (1978).  Rapid  vs.  Rabid : A Catalogue  of  Acoustic  Features  that  May 
Cue  the  Distinction.  Haskins  Laboratories  Status  Report  on  Speech 
Research,  SR-54,  127-132. 

Lisker,  L.  (1986).  "Voicing"  in  English:  A Catalogue  of  Acoustic  Features 
Signaling  /b/  versus  /p/  in  Trochees.  Language  and  Speech,  29,  3-11. 

Lisker,  L.,  & Abramson,  A.  S.  (1964).  A Cross-Language  Study  of  Voicing  in 
Initial  Stops:  Acoustical  Measurements.  Word,  20,  384-422. 

Lisker,  L.,  & Abramson,  A.  S.  (1970).  The  Voicing  Dimension:  Some 
Experiments  in  Comparative  Phonetics.  Proceedings  of  the  Sixth 
International  Congress  of  Phonetic  Sciences,  Prague,  1967  (pp.  563-567). 
Prague:  Academia,  Publishing  House  of  the  Czechoslovak  Academy  of 
Sciences. 

Logan,  J.  S.,  Lively,  S.  E.,  & Pisoni,  D.  B.  (1991).  Training  Japanese  Listeners  to 
Identify  English  /r/  and  /l/:  A First  Report.  Journal  of  the  Acoustical 
Society  of  America,  89,  874-886. 

Macmillan,  N.  A.  (1987).  Beyond  the  Categorical/Continuous  Distinction:  A 
Psychophysical  Approach  to  Processing  Modes.  In  S.  Harnad  (Ed.), 
Categorical  Perception:  The  Groundwork  of  Cognition  (pp.  53-85). 
Cambridge:  Cambridge  University  Press. 

Macmillan,  N.  A.,  & Creelman,  C.  D.  (1991).  Detection  Theory:  A User's 
Guide.  Cambridge:  Cambridge  University  Press. 

Maddieson,  I.  (1984).  Patterns  of  Sounds.  Cambridge:  Cambridge  University 
Press. 

Malecot,  A.  (1956).  Acoustic  Cues  for  Nasal  Consonants:  An  Experimental 
Study  Involving  a Tape-Splicing  Technique.  Language,  32,  274-284. 


211 


Massaro,  D.  W.  (1987).  Speech  Perception  by  Ear  and  Eye:  A Paradigm  for 
Psychological  Inquiry.  Hillsdale,  NJ:  Lawrence  Earlbaum  Associates. 

Massaro,  D.  W.  (1992).  Broadening  the  Domain  of  the  Fuzzy  Logical  Model  of 
Perception.  In  H.  L.  Pick,  P.  W.  van  den  Broek,  & D.  C.  Knill  (Eds.), 
Cognition:  Conceptual  and  Methodological  Issues  (pp.  51-84).  Washington, 
D.C.:  American  Psychological  Association. 

Massaro,  D.  W.,  Cohen,  M.  M.,  Gesi,  A.,  Heredia,  R.,  & Tsuzaki,  M.  (1993). 
Bimodal  Speech  Perception:  An  Examination  Across  Languages.  Journal  of 
Phonetics,  21,  445-478. 

McCarthy,  J.  J.  (1981).  A Prosodic  Theory  of  Nonconcatenative  Morphology. 
Linguistic  Inquiry,  12,  373-418. 

McGurk,  H.,  & MacDonald,  J.  (1976).  Hearing  Lips  and  Seeing  Voices.  Nature, 
264,  746-748. 

McNicol,  D.  (1972).  A Primer  of  Signal  Detection  Theory.  London:  George 
Allen  & Unwin. 

Mendenhall,  W.,  & Beaver,  R.  J.  (1991).  Introduction  to  Probability  and 
Statistics  (8th  ed.).  Boston,  MA:  PWS-Kent  Publishing  Co. 

Mermelstein,  P.  (1973).  Articulatory  Model  for  the  Study  of  Speech 

Production.  Journal  of  the  Acoustical  Society  of  America,  53,  1070-1082. 

Miller,  J.  L.  (1994).  On  the  Internal  Structure  of  Phonetic  Categories:  A 
Progress  Report.  Cognition,  50,  271-285. 

Miller,  J.  L.,  & Eimas,  P.  D.  (1995).  Speech  Perception:  From  Signal  to  Word. 
Annual  Review  of  Psychology,  46,  467-492. 

Miller,  J.  L.,  Kent,  R.  D.,  & Atal,  B.  S.  (Eds.).  (1991).  Papers  in  Speech 

Communication:  Speech  Perception. Woodbury,  NY:  Acoustical  Society  of 
America. 

Muysken,  P.,  & Smith,  N.  (Eds.).  (1986).  Substrata  versus  Universals  in  Creole 
Genesis.  Amsterdam:  John  Benjamins  Publishing  Company. 

Nearey,  T.  M.  (1990).  The  Segment  as  a Unit  of  Speech  Perception.  Journal  of 
Phonetics,  18,  347-373. 


212 


Nearey,  T.  M.  (1991).  Perception:  Automatic  and  Cognitive  Processes.  In 
Proceedings  of  the  Xllth  International  Congress  of  Phonetic  Sciences  (Vol. 
1,  pp.  40-49).  Aix-en-Provence,  France:  Universite  de  Provence. 

Nearey,  T.  M.  (1992).  Context  Effects  in  a Double-Weak  Theory  of  Speech 
Perception.  Language  and  Speech,  35,  153-171. 

Nearey,  T.  M.,  & Hogan,  J.  T.  (1986).  Phonological  Contrast  in  Experimental 
Phonetics:  Relating  Distributions  of  Production  Data  to  Perceptual 
Categorization  Curves.  In  J.  J.  Ohala,  & J.  J.  Jaeger  (Eds.),  Experimental 
Phonology  (pp.  141-161).  Orlando,  FL:  Academic  Press. 

Nosofsky,  R.  M.  (1988).  Similarity,  Frequency,  and  Category  Representations. 
Journal  of  Experimental  Psychology:  Learning,  Memory,  and  Cognition, 
14,  54-65. 


O'Connor,  J.  D.,  Gerstman,  L.  J.,  Liberman,  A.  M.,  Delattre,  P.  C.,  & Cooper,  F. 
S.  (1957).  Acoustic  Cues  for  the  Perception  of  Initial  /w,  i,  r,  1/  in  English. 
Word,  13,  24-43. 

Ohala,  J.  J.  (1981).  The  Listener  as  the  Source  of  Sound  Change.  In  C.  S.  Masek, 
R.  A.  Hendrick,  & M.  F.  Miller  (Eds.),  Papers  from  the  Parasession  on 
Language  and  Behavior  (pp.  178-203).  Chicago:  Chicago  Linguistic  Society. 

Ohala,  J.  J.  (1992).  What's  Cognitive,  What's  Not,  in  Sound  Change.  In  G. 
Kellermann,  & M.  D.  Morrissey  (Eds.),  Diachrony  Within  Synchrony: 
Language  History  and  Cognition  (Duisburger  Arbeiten  zur  Sprach-  un 
Kulturwissenschaft  14)  (pp.  309-355).  Frankfort:  Peter  Lang  Verlag. 
(Reprinted  in  Lingua  e Stile,  1992,  27,  321-362) 

Ohala,  J.  J.,  & Jaeger,  J.  J.  (Eds.).  (1986).  Experimental  Phonology,  Orlando,  FL: 
Academic  Press. 

Parker,  F.  (1977).  Distinctive  Features  and  Acoustic  Cues.  Journal  of  the 
Acoustical  Society  of  America,  62,  1051-1054. 

Port,  R.,  & Crawford,  P.  (1989).  Incomplete  Neutralization  and  Pragmatics  in 
German.  Journal  of  Phonetics,  17,  257-282. 


Raphael,  L.  J.,  Tobin,  Y.,  Faber,  A.,  Most,  T.,  Kollia,  H.  B.,  & Milstein,  D.  (1995). 
Intermediate  Values  of  Voice  Onset  Time.  In  F.  Bell-Berti,  & L.  J.  Raphael 
(Eds.),  Producing  Speech:  Contemporary  Issues  for  Katherine  Safford 
Harris  (pp.  117-127).  New  York:  AIP  Press. 


213 


Remez,  R.  E.  (1994).  A Guide  to  Research  on  the  Perception  of  Speech.  In  M. 
A.  Gernsbacher  (Ed.),  Handbook  of  Psycholinguistics  (pp.  145-172).  San 
Diego:  Academic  Press. 

Repp,  B.  H.  (1979).  Relative  Amplitude  of  Aspiration  Noise  as  a Voicing  Cue 
for  Syllable-Initial  Stop  Consonants.  Language  and  Speech,  22,  173-189. 

Repp,  B.  H.  (1982).  Phonetic  Trading  Relations  and  Context  Effects:  New 
Experimental  Evidence  for  a Speech  Mode  of  Perception.  Psychological 
Bulletin,  91,  81-110. 

Repp,  B.  H.  (1983).  Trading  Relations  among  Acoustic  Cues  in  Speech 
Perception  are  Largely  a Result  of  Phonetic  Categorization.  Speech 
Communication,  2,  341-361. 

Repp,  B.  H.  (1984).  Closure  Duration  and  Release  Burst  Amplitude  Cues  to 
Stop  Consonant  Manner  and  Place  of  Articulation.  Language  and  Speech, 
27,  245-254. 

Rosch,  E.  (1977).  Human  Categorization.  In  N.  Warren  (Ed.),  Studies  in  Cross- 
Cultural  Psychology  (Vol.  1,  pp.  1-49).  London:  Academic  Press. 

Rosch,  E.  (1978).  Principles  of  Categorization.  In  E.  Rosch,  & B.  B.  Lloyd  (Eds.), 
Cognition  and  Categorization  (pp.  27-48).  Hillsdale,  NJ:  Lawrence  Erlbaum 
Associates. 

Schatz,  C.  D.  (1954).  The  Role  of  Context  in  the  Perception  of  Stops.  Language 
30,  47-56. 


Serniclaes,  W.  (1975-1976).  Prevoicement  et  delai  d etablissement  du 

voisement  : deux  indices  independants  pour  la  perception  des  occlusives 
[Prevoicing  and  Voice  Onset  Time:  Two  Independent  Cues  for  the 
Perception  of  Stops].  Rapport  d'Activites  de  I'Institut  de  Phonetique 
(Universite  Libre  de  Bruxelles),  10,  83-104. 


Serniclaes,  W.  (1978-1979).  Sur  la  dissociation  entre  periodicite,  bruit,  et 
frequence  fondamentale  en  tant  qu'indices  de  voisement  des  occlusives 
du  franqais  [On  the  Dissociation  between  Periodicity,  Noise,  and 
Fundamental  Frequency  as  Cues  for  Voicing  of  the  Stops  of  French]. 
Rapport  d'Activites  de  I'Institut  de  Phonetique  (Universite  Libre  de 
Bruxelles),  13,  7 1-93. 


Sommers,  M.  S.,  Nygaard,  L.  C.,  & Pisoni,  D.  B.  (1992).  Stimulus  Variability 
and  the  Perception  of  Spoken  Words:  Effects  of  Variation  in  Speaking  Rate 


214 


and  Overall  Amplitude.  In  J.  J.  Ohala,  T.  M.  Nearey,  B.  L.  Derwing,  M.  M. 
Hodge,  & G.  E.  Wieve  (Eds.),  ICSLP  92  Proceedings:  1992  International 
Conference  on  Spoken  Language  Processing  (Vol.  1,  pp.  217-220). 
Edmonton,  Alberta,  Canada:  University  of  Alberta. 

Stathopoulos,  E.  T.,  & Weismer,  G.  (1983).  Closure  Duration  of  Stop 
Consonants.  Journal  of  Phonetics,  11,  395-400. 

Stevens,  S.  S.  (1946).  On  the  Theory  of  Scales  of  Measurement.  Science,  103, 
677-680. 

Sussman,  J.  E.,  & Lauckner-Morano,  V.  J.  (1995).  Further  Tests  of  the 
"Perceptual  Magnet  Effect"  in  the  Perception  of  [i]:  Identification  and 
Change/No-Change  Discrimination.  Journal  of  the  Acoustical  Society  of 
America,  97,  539-552. 

Thurstone,  L.  L.  (1927).  A Law  of  Comparative  Judgment.  Psychological 
Review,  34,  273-286. 

Tranel,  B.  (1987).  The  Sounds  of  French:  An  Introduction.  Cambridge: 
Cambridge  University  Press. 

Trubetzkoy,  N.  S.  (1939/1969).  Principles  of  Phonology  (C.  A.  M.  Baltaxe, 
Trans.).  Berkeley  & Los  Angeles:  University  of  California  Press.  (Original 
work  published  1939) 

Weinreich,  U.,  Labov,  W.,  & Herzog,  M.  I.  (1968).  Empirical  Foundations  for  a 
Theory  of  Language  Change.  In  W.  P.  Lehmann,  & Y.  Malkiel  (Eds.), 
Directions  for  Historical  Linguistics.  Austin,  TX:  University  of  Texas  Press. 

Whalen,  D.  H.  (1989).  Vowel  and  Consonant  Judgments  Are  Not  Independant 
When  Cued  by  the  Same  Information.  Perception  & Psychophysics,  46, 
284-292. 

Whalen,  D.  H.,  Abramson,  A.  S.,  Lisker,  L.,  & Mody,  M.  (1988).  Fundamental 
Frequency  Provides  Voicing  Information  Even  With  Unambiguous  VOTs. 
Journal  of  the  Acoustical  Society  of  America,  84  (Suppl.  1),  S155-S156. 


BIOGRAPHICAL  SKETCH 


I was  born  and  raised  in  Ft.  Lauderdale,  Florida,  and  attended  Wilton 
Manors  Elementary  and  Nova  High  School  in  Davie.  I was  awarded  a 
National  Merit  Scholarship  to  attend  Stetson  University,  where  I received  my 
BA  in  1977  with  a major  in  English  and  substantial  course  work  in  French. 
After  spending  3 years  as  a technical  writer  in  the  Washington,  DC,  area,  I 
went  to  the  Universite  des  Langues  et  Lettres  (now  Universite  Stendhal)  in 
Grenoble,  France,  for  language  experience,  adventure,  and  an  introduction  to 
linguistics.  I got  the  language  experience  and  the  adventure  I was  looking  for, 
but  for  better  or  worse,  I did  not  stop  at  the  introduction  to  linguistics.  I left 
Grenoble  three  years  later  as  a phonetician,  having  earned  a Maitrise  des 
Sciences  du  Langage.  I entered  graduate  school  at  the  University  of  Florida,  in 
the  Program  in  Linguistics.  In  1990  I returned  to  Grenoble  with  a Bourse 
Chateaubriand  (a  French  Government  fellowship)  to  do  the  research  reported 
in  this  dissertation. 


215 


I certify  that  I have  read  this  study  and  that  in  my  opinion  it  conforms 
to  acceptable  standards  of  scholarly  presentation  and  is  fully  adequate,  in 
scope  and  quality,  as  a dissertation  for  the  degree  of  Doctor  of  Philosophy. 


Willie 

Assod 


J.  Sullivan,  Chair 
te  Professor  of  Linguistics 


I certify  that  I have  read  this  study  and  that  in  my  opinion  it  conforms 
to  acceptable  standards  of  scholarly  presentation  and  is  fully  adequate,  in 
scope  and  quality,  as  a dissertation  for  the  degree  of  Doctor  of  Philosophy. 


■'—4.  4 


Caroline  Wiltshire,  Cochair 
Assistant  Professor  of  Linguistics 


I certify  that  I have  read  this  study  and  that  in  my  opinion  it  conforms 
to  acceptable  standards  of  scholarly  presentation  and  is  fully  adequate,  in 
scope  and  quality,  as  a dissertation  for  the  degree  of  Doctor  of  Philosophy. 


Professor  of  Communication 
Processes  and  Disorders 


I certify  that  I have  read  this  study  and  that  in  my  opinion  it  conforms 
to  acceptable  standards  of  scholarly  presentation  and  is  fully  adequate,  in 
scope  and  quality,  as  a dissertation  for  the  degree  of  Doctor  of  Philosophy. 


I certify  that  I have  read  this  study  ancf  that  in  my  opinion  it  conforms 
to  acceptable  standards  of  scholarly  presentation  and  is  fully  adequate,  in 
scope  and  quality,  as  a dissertation  for  the  degree  of  Doctor  of  Philosophy. 


Ita  Fischler 
Professor  of  Psychology 


This  dissertation  was  submitted  to  the  Graduate  Faculty  of  the  Program 
in  Linguistics  in  the  College  of  Liberal  Arts  and  Sciences  and  to  the  Graduate 
School  and  was  accepted  as  partial  fulfillment  of  the  requirements  for  the 
degree  of  Doctor  of  Philosophy. 


August,  1996 


Dean,  Graduate  School 


